Why the Established AI Content Licensing Model Breaks with Books (3 of 4)

Jonathan WoahnJun 17, 20255 min read

AI Content StrategyContent Monetization

This third article discusses where the established content licensing models break for books and makes the case for why this is a problem worth solving.

Designers: Jonathan Woahn, Michael Moulton / Tools: MidJourney, Recraft, Photoshop

In the previous article, we analyzed how digital publishers have partnered with AI companies—establishing licensing frameworks around training and inference that address concerns about consent, credit, and compensation. Now, let's continue the conversation and explore why that model doesn't work with books.

Book Publisher Content Licensing for Usage with AI Companies

Here's the critical insight that frames everything in this article: experts estimate the AI inference market will be 100 times larger than the training market.

That's not a typo. The inference market—where AI systems actually serve users, answer questions, and generate value in real-time—dwarfs the training market in terms of long-term economic potential.

Yet books are currently positioned almost exclusively for training, not inference.

This matters enormously. Meta's Mark Zuckerberg has been vocal about open-sourcing AI models, which means the value of training data depreciates over time as models become commoditized. If books can only participate in training, publishers are betting on the smaller, shrinking side of the equation.

As one industry observer put it: "Creators overestimate the value of their content for training and underestimate the value of their content for inference."

Why Are Books Best Positioned for Training AI and Not Inference?

To understand this, let's analyze through the lens of our 3C's framework: Consent, Credit, and Compensation. We'll look at how books perform on each dimension for both training and inference.

Book Content Licensing: Training

Consent and Compensation

For training, the consent and compensation model is relatively straightforward. Publishers can negotiate perpetual licensing deals—granting AI companies the right to use their content to train specific models. Wiley's reported $44 million deal demonstrates that this market is real and functioning.

Since nearly all books exist in digital formats (EPUB, PDF, etc.), publishers can relatively easily package and share their content as training datasets. The licensing is typically per-model, meaning each new generation of an AI model requires a new agreement.

Credit

Here's where things get complicated—and this applies to training across all content types, not just books.

Think of it like learning to play piano. When you study sheet music for years, the notes, patterns, and techniques become internalized. You can sit down and play a beautiful piece—but you're not reading the sheet music anymore. It's become part of how you think about music.

Similarly, when an AI model is trained on books, the content becomes internalized into the model's parameters. The model doesn't store copies of the books—it learns patterns, relationships, and structures from them. This makes it essentially impossible to trace a specific output back to a specific training source.

Credit for training is inherently problematic because the very nature of the training process makes exact attribution impossible. The content becomes "training soup"—blended into the model's understanding in ways that can't be easily untangled.

Book Content Licensing: Inference

Credit

Technically, giving credit during inference is straightforward. Through a process called RAG (Retrieval-Augmented Generation), AI systems can pull specific passages from content and cite them directly. The technology works.

But here's the problem specific to books: there's no canonical way to reference book content.

When a digital publisher's content is cited during inference, the AI can link to a URL—say, https://www.nytimes.com/article-name. The user clicks the link, verifies the source, and the publisher gets traffic.

But what's the equivalent for a book? There's no http://www.INSERT_BOOK_TITLE.com/chapter1. Books don't have URLs. They don't have standardized digital addresses. There's no universal way for an AI system to point a user to a specific passage in a specific edition of a specific book.

This is a fundamental infrastructure gap that makes credit during inference nearly impossible for books.

Consent

Digital publishers like Reddit can update their privacy policies and terms of service to grant broad permissions for AI use. Users who continue to use the platform implicitly consent. The rights are aggregated at the platform level.

Books work completely differently. Rights are held individually—by authors, estates, and publishers through complex contractual arrangements. Each book has its own set of permissions, territories, and usage restrictions. There's no aggregate consent mechanism. Licensing a single book for inference requires navigating a web of individual agreements that simply doesn't scale.

Compensation

For digital publishers, compensation during inference works through subscription or API-based models. The platform controls access, tracks usage, and charges accordingly. Reddit knows exactly how many API calls are made. The New York Times can track how often its content is retrieved.

But for books, there's currently no tracking mechanism for inference usage. If an AI system retrieves a passage from a book to answer a user's question, who records that interaction? Who measures how frequently it happens? Who determines fair compensation?

Without tracking infrastructure, fair compensation for book content during inference is impossible.

The Crux of the Issue

Let's put this all together in a matrix:

Matrix

Digital publishers have solved—or are actively solving—the 3C's for both training and inference. Book publishers have only solved them for training. The inference column for books is almost entirely red.

This is the crux of the issue. Books cannot participate in the largest AI opportunity—the inference market—without solving credit, consent, and compensation first.

Why This Matters: Three Reasons

1. The Biggest Impact Comes from the Simplest Changes

The problems blocking books from inference aren't unsolvable. They're infrastructure problems. Give books a canonical digital identity. Create standardized consent mechanisms. Build usage tracking systems.

Think about what Stripe did for online payments. Before Stripe, accepting payments online was a nightmare of bank integrations, compliance requirements, and custom code. Stripe didn't change the fundamental economics of payments—they just made the infrastructure dramatically simpler. And in doing so, they unlocked trillions of dollars in commerce.

The book industry needs its Stripe moment. The best documentation, the clearest APIs, the simplest integration paths—that's what unlocks massive value from relatively simple changes.

2. Who Owns the Problem?

This is the uncomfortable question. Publishers lack the technical expertise to build this infrastructure. Technology companies could build it, but their track record with publishers suggests they'd build it in a way that serves their own interests—potentially creating "another Amazon" that controls the relationship between books and AI.

The ideal solution comes from a trusted intermediary—one that understands both publishing and technology, and whose incentives are aligned with publisher interests.

3. The Opportunity Is Massive

Let's do some back-of-the-envelope math. Wiley earns roughly $22 million annually from AI training licenses. If inference represents a 100x opportunity—which is the consensus estimate—that's a potential $2.2 billion market for Wiley alone.

Now extrapolate that across the entire publishing industry. Even a conservative estimate—say, a 10% revenue increase from inference licensing—represents billions of dollars in new revenue at significantly higher margins than traditional book sales.

This isn't speculative. The inference market is growing rapidly. The question is whether books will participate in it or be left behind.

This is Part 3 of a 4-part series on AI and the Future of Book Publishing:

AI and the Future of Book Publishing
The Current State of AI and Publishing
Why the Established AI Content Licensing Model Breaks with Books
Vision and Call to Action for AI and the Future of Books

Companion article: A Beginner's Guide to Understanding Generative AI

Stay in the loop

Get the latest insights on AI, content licensing, and the future of publishing.

Subscribing...

You're subscribed! We'll keep you in the loop.

AI PublishingAI Content StrategyContent Monetization
The Scholarly Kitchen: AI Isn't Going to Pay for Content — Part Two: The Path Forward (Guest Post)
The second installment continues exploration of AI's impact on content compensation, shifting focus to where genuine economic opportunity may emerge as AI systems become operationalized.
Jonathan Woahn·Feb 4, 2026

Why the Established AI Content Licensing Model Breaks with Books (3 of 4)

Book Publisher Content Licensing for Usage with AI Companies

Why Are Books Best Positioned for Training AI and Not Inference?

Book Content Licensing: Training

Consent and Compensation

Credit

Book Content Licensing: Inference

Credit

Consent

Compensation

The Crux of the Issue

Why This Matters: Three Reasons

1. The Biggest Impact Comes from the Simplest Changes

2. Who Owns the Problem?

3. The Opportunity Is Massive

Stay in the loop

Read More

The Scholarly Kitchen: AI Isn't Going to Pay for Content — Part Two: The Path Forward (Guest Post)