
AI Isn't Going to Pay for Content (At Least Not How You're Hoping It Will)
AI training will not become the content windfall publishers hope for. The real economic opportunity lies in inference — and the window to shape its terms is open now.
This article was originally published as a two-part series on The Scholarly Kitchen (Part 1, Part 2).
Part One: The Missed Expectation
In October 2025, at the FIPP World Congress in Madrid, a ballroom of media executives listened as journalist Ricky Sutton finally asked the question that captured both the anxiety and the hope of the entire room:
"Shouldn't your AI system pay every time it references our content? Shouldn't publishers be paid each time there's a query for content, not just when an ad is served?"
The question was crisp, reasonable, and rooted in a decades-long erosion of content economics. Publishers have watched value slip from their hands through unbundling, aggregation, and search. They are now hoping — some quietly, some explicitly — that AI will reverse the trend. If artificial intelligence is going to ingest their work, understand it, and depend on it, then surely AI companies should pay for that privilege.
The question was directed at Tom Rubin, OpenAI's Chief of Intellectual Property and Content. His answer was careful, neutral, and ultimately forgettable — not because the question was misguided, but because Rubin understands an uncomfortable truth: the economics that publishers are hoping for don't exist today.
This exchange revealed the core misunderstanding shaping today's debate. Many in the industry believe AI companies are about to become a new class of bulk content buyers: predictable, recurring, and highly motivated to pay for data at scale. But the reality is far more constrained. AI companies will not — and cannot — be the primary purchasers of content in a sustainable way.
In this two-part series, we first focus on the missed expectation: why AI will not become the content windfall the way many in the publishing industry hope. The second article will explore the historical playbook that reveals where the real economic opportunity lies, and describe the path to get there.
I. The Great Expectation — and the Wrong Customer
For the past two years, the industry has been awash in headlines that create the appearance of a new licensing boom. OpenAI announced deals with the Associated Press, Financial Times, and other major media groups. Anthropic begrudgingly struck agreements with professional authors and academic publishers. Google relied on privately negotiated licenses for Gemini and its Search Generative Experience. Perplexity launched revenue-sharing programs for premium publishers.
From the outside, it resembled the opening chapter of the streaming wars: large platforms racing to secure content, armed with big checks and strategic urgency. Beneath the surface, the economics tell a very different story.
Every leading AI company is burning enormous amounts of capital. Their cost structures — GPUs, proprietary chips, power consumption, data center construction — are unprecedented. Even the companies reporting strong corporate earnings like Microsoft and Google are doing so on the back of legacy businesses, not AI profitability. The AI divisions themselves are still deeply unprofitable.
A simple truth follows:
A market cannot scale if the buyers cannot afford to participate.
The licensing deals of the past two years have not been funded by healthy revenue. They have instead been funded by strategic budgets, competitive signaling, and investor subsidy. These deals represent experiments — not the foundation of a sustainable marketplace. To understand why these deals are unlikely to become recurring revenue for publishers, we need to dig into how AI companies and their models interact with content.
II. A Tale of Two Markets: Training vs. Inference
To most observers, "AI uses content" feels like one continuous event. A model reads material, absorbs information, and later produces answers that reflect what it has learned. But inside the industry, these are two different markets — training and inference — and they operate under entirely different economic and legal conditions.
A helpful analogy is a student in a professional program — say, engineering.
Training: The Education Phase
Training is the long period when the student studies textbooks, reviews examples, and internalizes the foundational concepts needed to think like an engineer. The goal isn't memorization; it's pattern formation and conceptual understanding.
That is what training an AI model is: Exposure to content. Pattern formation. Generalize the patterns into concept acquisition. Rinse and repeat.
Inference: The Practice Phase
Inference is when the student graduates and becomes a practicing engineer — they are handed a real problem and asked to solve it. This is the moment that value is created for clients. Inference is where firms bill clients. It is where learned skills become services.
When a user asks an AI system a question, the model is no longer "learning," it is inferring.
Why This Distinction Matters Economically
Educational exposure and professional practice follow different economics:
A university doesn't get paid every time an engineer uses a concept learned years ago. The billable work happens when the engineer applies those concepts to solve a real problem that someone has right now.
Training is education. Inference is practice. Only the latter generates recurring revenue from clients.
III. Training: A Market That Looked Promising but Won't Scale
The belief that training would become a major licensing opportunity rests on a straightforward intuition:
If AI companies learn from content, they should pay for it.
On its face, the argument is compelling, more so if we lean into the student example we examined previously. After all, students pay hefty tuition fees and buy lots of textbooks.
But four forces — legal, economic, technical, and geopolitical — make clear why training revenue will be episodic rather than reliable and recurring.
1. The Economics of Training Are Fundamentally Front-Loaded
Training is extraordinarily expensive. AI companies spend hundreds of millions building training corpora, assembling proprietary datasets, running multi-month training cycles, and maintaining massive GPU clusters. Because training is such a large cost center, every technical and economic incentive pushes in the same direction: minimize the amount of fresh data required, reuse what has already been collected, and avoid perpetual acquisitions wherever possible.
Once a company builds a high-quality dataset, it becomes the backbone for multiple generations of the model:
- pretraining
- continual learning
- fine-tuning
- domain adaptation
- successor model families
Instead of seeking more content, companies optimize their pipelines to extract more value from the same dataset.
2. China and Open-Source Models Undercut Pricing Power
A growing share of frontier-model innovation now comes from Chinese labs — Baidu, Alibaba, 01.AI, DeepSeek, and others — and their models are trained on enormous corpora of unlicensed material. Many of these models are competitive with Western offerings and are increasingly released as open-source alternatives.
This creates a harsh pricing reality:
Why would Western companies pay premium rates for fully licensed corpora when their global competitors train comparable systems for free?
Even labs that want to train responsibly are squeezed by this environment. If the cost of "doing it right" slows their release cadence, raises their expenses, or results in even marginally weaker models, they risk falling behind competitors who don't operate under the same constraints.
To be clear, this is not a defense of those practices, nor an argument that they are acceptable. It is simply the competitive landscape Western companies face. The existence of high-performing open-weight models trained on unlicensed data — whatever one thinks of their provenance — imposes a ceiling on what the commercial market is willing to pay for training data.
The pressure is structural, not moral. And it sharply limits the pricing power publishers can expect in the training market.
3. Courts Are Trending Toward Fair Use in Training
Two major rulings last summer — one involving Meta, one involving Anthropic — offered the clearest judicial signals yet around training-phase copyright.
In the Meta case, the court ruled that plaintiffs had not shown market harm from Meta's use of books to train Llama. Because the model did not reproduce or substitute for the works, the court characterized training as transformative.
In the Anthropic case, the court drew a pivotal line:
- Training on lawfully acquired content: likely fair use
- Aggregating pirated books into a library for training: not fair use
These rulings expose the core legal challenge:
Unless a model regurgitates copyrighted text, plaintiffs struggle to prove the kind of market harm required to win.
The judicial trend increasingly favors the idea that training — standing alone — is transformative and therefore protected, which weakens the structural basis of a market for large-scale training-phase licensing.
4. The Future of Training Is Smaller and More Specialized
A final constraint is technical. The era of "just pour in more data" is ending.
AI leaders across the industry have acknowledged that frontier models have already consumed most of the high-quality text available on the public web. When you've already trained on the vast majority of useful public text, the marginal value of adding more general-purpose data declines rapidly. The next improvements in model capability will come from:
- highly specialized domain corpora
- well-structured technical datasets
- targeted refreshes rather than massive new ingestions
- data with deep internal organization, not broad volume
This is a very different market than many publishers imagine. It is not a recurring, everybody-wins licensing ecosystem. It is a narrow, specialist market where value is concentrated in specific domains at specific moments.
From Pattern to Practice
Training will remain part of the licensing landscape, but its ceiling is clear. Economic incentives push AI companies to minimize it, global competition limits pricing power, courts increasingly treat it as fair use, and technical progress reduces the need for bulk new data. These forces do not eliminate the training market, but they do define it: episodic, constrained, and incapable of supporting the broad, recurring revenue streams publishers are hoping for.
IV. Inference: The Economics Publishers Already Understand
If publishers want to see where AI can support a sustainable content market, they don't need a new business framework. Many already operate inside one: the academic journals market.
Journals exhibit core traits of a healthy market:
- recurring demand
- user-driven value
- measurable interactions
- established monetization
- strong attribution
The economic event ties to each individual use. Need, value, and usage recur. Inference behaves the same way.
When a student clarifies a method, a clinician checks a concept, or a researcher verifies a model, the AI system is performing an action that depends directly on authoritative content. Each interaction is:
- discrete and attributable
- measurable
- tied to user value
- and highly repeatable
Inference is driven by ongoing user need — not by a platform's one-time training decision.
Figure 1: Market Comparison
| Market Signal | Training | Inference |
|---|---|---|
| Demand | One-time or infrequent; front-loaded | Continuous; every query creates demand |
| Buyer Base | Few AI companies labs | Broad: billions of users + institutions |
| Attribution | Weak, difficult to prove | Strong; traceable |
| Monetization | Limited | Multiple paths |
| Incentive Alignment | Labs seek to reduce data costs | Platforms need authoritative content |
Inference has the signals of a durable market. Training does not.
Intermission
We conclude Part One with a singular message: the training market will not become a dependable source of revenue for publishers. It is finite, episodic, and shaped by buyers whose incentives push them to reduce — not expand — their reliance on paid data. For many publishers, the meaningful upside from training may already be behind them.
The more important story is what comes next.
The real market is only now beginning to emerge, and it centers on inference rather than training. In Part Two, we will examine the historical patterns emerging platforms follow, why AI is entering the same trajectory, and where publishers should position themselves as inference becomes the center of gravity and presents the opportunity to reset the terms of engagement with technology providers.
Part Two: The Path Forward
In Part One, The Missed Expectation, we argued that AI training will not become the durable revenue engine publishers hope for. While uncomfortable, this conclusion isn't pessimistic. The real economic opportunity lies not in how models are trained, but in how they are used.
In this context, usage is the inference stage: the moment an AI model applies its training to solve a specific user problem. Clients pay to have these problems solved, not for the training data itself.
But inference monetization doesn't arrive by default. Unlike the lump-sum payouts and corpus delivery for training deals, inference happens when the AI accesses and processes individual articles; capturing this usage requires new infrastructure and norms. Monetizing inference requires a clear understanding of how value is created and captured when content no longer reaches users in its original form.
In this second part, we focus on the path forward. Not a theoretical one, but one grounded in historical precedent and emerging reality. AI is not destroying the content economy; it is forcing it to evolve. To understand that evolution, we must first look to the past.
The Great Reallocation
When technology disrupts an industry, the way the product is consumed and the way consumption is paid for fall out of sync. Demand may even rise while revenue from established payment routes collapses. Only later do new revenue models allow producers to recapture value.
Music is the canonical example. Napster greatly expanded listening, but demolished revenue from CDs and records. So even though musicians continued making music, and users listened more than ever, the industry was unable to monetize this new form of consumption. Finally, after a period of turbulence, services like Spotify and Apple Music emerged with models that reconnected listening and compensation under a new consumption paradigm.
This pattern now applies to scholarly and professional content. Demand for authoritative knowledge has not declined: researchers, students, and professionals still rely on trusted sources to do their work.
What has changed is how that knowledge is delivered. Instead of directly reading articles, users increasingly ask AI systems to locate, synthesize, and report back on relevant work. Like the music industry in the Napster era, the delivery system has changed, but the revenue models are yet to stabilize.
Publishing is not at the end of its economic arc; it is in the turbulent middle of a reallocation.
Reallocations are often framed as cataclysmic and are very uncomfortable for those inside the affected industry. In practice, they are remarkably consistent. The table below shows how this reallocation unfolded in music and online video — and how it is now unfolding in publishing.
Table 1: The Reallocation Pattern
| Stage | Music Industry | YouTube / Online Video | AI & Publishing |
|---|---|---|---|
| 1. Initial Shock (Demand Shifts) | 1999 – Napster enables instant, free music sharing; listening explodes | 2005 – YouTube makes video frictionless and global | 2022 – ChatGPT introduces conversational, personalized answers |
| 2. Monetization Collapse | 2001-2002 – CD sales fall sharply; consumption decouples from payment | 2005-2007 – Broadcast and DVD economics break; attention shifts online | 2022-2024 – Pageviews and clicks lose relevance as answers bypass human reading |
| 3. Existential Panic | 2000–2001 – Napster lawsuits; industry fears extinction | 2007 – Viacom lawsuit; "YouTube will destroy media" | 2023–2024 – Publisher, author, and news lawsuits against AI labs |
| 4. Token Partnerships (Unscalable Deals) | 2003 – Labels license catalogs to iTunes (bespoke, fragile) | 2006–2008 – Media companies strike custom YouTube deals | 2023–2025 – Strategic AI licensing deals with select publishers |
| 5. Infrastructure Formation ← We are here | 2003–2008 – Licensing frameworks, DRM, billing systems | 2007 – Content ID, Partner Program, attribution tooling | 2024–2026 – Usage tracking, MCP, publisher programs |
| 6. Sustainable Monetization | 2008–2015 – Downloads + streaming + touring + merch | 2010–2018 – Ads + subscriptions + creator ecosystems | TBD – Subscriptions, ads, various licensing models (?) |
| 7. Norms Lock In (Defaults Harden) | ~2019 – Streaming normalized; revenue surpasses pre-Napster peak | ~2018–2020 – Creator economy stabilizes; rules widely accepted | TBD |
This Reallocation Is Different
In every prior reallocation, the decisive moment came before monetization stabilized — when infrastructure was beginning to form, but norms had not yet locked in. What followed depended on whether industries correctly understood the shift they were actually facing.
AI places publishing at that same inflection point, and where we go from here will depend on how well we understand the shift we are facing. This time, it is not merely one of distribution.
Previous reallocations changed how content was packaged and distributed, but the content itself still reached the end user largely as the creator intended. A song moved from vinyl to CD to streaming, but listeners still heard the original recording. A film moved from theaters to DVDs to online video, but viewers still watched the work as originally produced.
AI breaks that continuity.
Content can now create value without ever being presented to the user in its original form. Ideas are extracted, combined, and reassembled into new outputs. Users benefit from the underlying works without directly encountering them. When value is created this way, revenue from that value does not automatically flow back to its sources.
History offers a playbook for how platform transitions unfold. What it does not offer are solutions for the latest reallocation, the route to reconnecting how product is paid for with how it is consumed. That is the problem the next section addresses.
Three Leading Economic Models Emerging for Publishers
If history is a guide, the market will not converge on a single economic model. Instead, it will stabilize around a monetization mix — coexisting approaches aligned to different sources of demand and control.
That mix is already forming in AI-driven inference. While implementations vary, three approaches are now emerging with clarity.
1. Pay-Per-Use (PPU)
The PPU model itself is straightforward. When a user consumes content through an AI platform, the platform pays the relevant publisher for the content that informed the response.
Crucially, the platform is not the economic consumer of the content. Users are. Through their queries, engagement, subscriptions, or attention, users signal which sources matter. The platform aggregates those signals and compensates publishers accordingly.
This is how Perplexity's Premium Data for All program operates. When licensed publisher content is used to generate an answer, Perplexity compensates the publisher on a per-use basis. The license is narrow and temporary, but it is paid — and it scales with demand. Better sources improve answer quality, which drives usage and, in turn, supports the subscriptions and upgrades that fund content access.
The same pay-per-use logic can be funded by advertisers rather than users. ProRata.ai offers a clear example. ProRata applies an ad-supported model to AI inference: ads are displayed contextually alongside AI-generated answers, generating revenue on every query. That revenue is split 50:50 with publishers whose content contributed to the response. Users receive free access, advertisers fund the system, and publisher compensation remains tied directly to per-query usage.
In both cases, the unit of value is the same: content contribution at inference time. What differs is the source of funding.
While the economics are simple, the infrastructure is not. Attribution, contribution weighting, fraud resistance, and transparent reporting must operate reliably at query-level granularity. The closest analogue is pay-per-click advertising, as implemented by Google. As with PPC, Pay-Per-Use can scale quickly — but only once pricing logic, attribution, delivery, and enforcement infrastructure are in place.
2. Bring-Your-Own-License (BYOL)
BYOL models work differently. Here, the user already holds rights to the content — typically through an institutional or professional subscription — and chooses to connect those rights to an AI platform.
A clear example is Wiley's partnership with Perplexity. Institutions that already license Wiley content can authenticate within Perplexity and access the material their existing agreements permit. In this arrangement, the AI platform acts as an interface layer, not a reseller, and the commercial relationship between publisher and customer remains unchanged.
The defining feature of BYOL is portability. The publisher sells the license to the customer; the customer decides where to exercise those rights. Like Netflix, a subscription is not tied to a single interface or device. BYOL extends that portability into AI-driven workflows, preserving pricing, trust, and institutional relationships while making platforms more useful to licensed users.
The constraint is reach. BYOL inherits the limits of existing entitlements, which means the completeness of AI-generated answers may vary across users or institutions depending on what content their subscriptions allow the system to access. In research settings, this creates a real risk: relevant work can be silently excluded if it falls outside the licensed corpus.
For that reason, BYOL works best as a portability layer for known, licensed collections — not as a complete solution for discovery when access must be created at the moment of need. That gap is what the next model addresses.
3. Licensing on Demand (LOD)
The third model centers on unlocking content at the moment it is discovered. While specific implementations are still emerging, several major publishers are actively developing Licensing on Demand offerings and are expected to bring them to market soon.
In Licensing on Demand, users do not need to hold licenses in advance, as with BYOL, nor rely on their AI platform to have pre-negotiated access, as with PPU. When valuable content surfaces during an AI interaction, access can be licensed immediately, in context, and under terms set directly by the publisher.
The core shift is timing: discovery comes first, and licensing follows immediately thereafter. Instead of forcing users to leave the AI experience or guess which subscription might apply, access is created precisely when the need becomes clear.
If Pay-Per-Use monetizes what users already consume, and BYOL extends licenses users already hold, Licensing on Demand enables access exactly when users — via AI-powered discovery — realize they need it.
Norms Worth Locking In Early
The AI reallocation right now offers publishers a rare opportunity: a chance to set the rules before they harden.
For the last two decades, search engines like Google and marketplaces like Amazon established defaults that publishers largely had to accept after the fact. AI is different. The ecosystem is still forming, and the terms are not yet fixed. That creates an opening to agree on how value should move.
From the publisher viewpoint, these norms should be stated early and often:
Inference on premium content should be paid. If authoritative content improves AI's answers, the publisher should be compensated when the content is used.
Attribution should be standard. Source visibility benefits users, preserves trust, and reinforces publisher value.
Usage should be transparent. Markets work when participants can see what is happening. Measurement enables pricing, investment, and accountability.
Direct relationships should remain intact. AI can intermediate access without permanently distancing publishers from their audiences or customers.
These norms are straightforward. They align incentives, support sustainable markets, and reflect how publishers already operate elsewhere.
AI systems depend on high-quality content, and publishers still control the creation of that content. The earlier these expectations are reinforced across the industry, the more likely they are to become the defaults that shape what comes next.
Conclusion
AI is not going to pay for content the way many publishers once hoped. Training AI models will not become a broad, recurring revenue engine. AI will pay for the information it needs when conducting inference — when it accesses high quality information and brings that value to the end user.
The publishers who engage now — by aligning on shared expectations, supporting emerging models, and investing in common infrastructure — will help shape how knowledge is discovered, credited, and paid for in the AI era.
Stay in the loop
Get the latest insights on AI, content licensing, and the future of publishing.
