OpenAI searches for an answer to its copyright problems

  News, Rassegna Stampa
image_pdfimage_print

The huge leaps in OpenAI’s GPT model probably came from sucking down the entire written web. That includes entire archives of major publishers such as Axel Springer, Condé Nast, and The Associated Press — without their permission. But for some reason, OpenAI has announced deals with many of these conglomerates anyway.

At first glance, this doesn’t entirely make sense. Why would OpenAI pay for something it already had? And why would publishers, some of whom are lawsuit-style angry about their work being stolen, agree?

I suspect if we squint at these deals long enough, we can see one possible shape of the future of the web forming. Google has been referring less and less traffic outside itself — which threatens the existence of the entire rest of the web. That’s a power vacuum in search that OpenAI may be trying to fill.

Let’s start with what we know. The deals give OpenAI access to publications in order to, for instance, “enrich users’ experience with ChatGPT by adding recent and authoritative content on a wide variety of topics,” according to the press release announcing the Axel Springer deal. The “recent content” part is clutch. Scraping the web means there’s a date beyond which ChatGPT can’t retrieve information. The closer OpenAI is to real-time access, the closer its products are to real-time results. 

On the one hand, this is peanuts, just embarrassingly small amounts of money

The terms around the deals have remained murky, I assume because everyone has been thoroughly NDA’d. Certainly I am in the dark about the specifics of the deal with Vox Media, the parent company of this publication. In the case of the publishers, keeping details private gives them a stronger hand when they pivot to, let’s say, Google and AI startup Anthropic — in the same way that not disclosing your previous salary lets you ask for more money from a new would-be employer.

OpenAI has been offering as little as $1 million to $5 million a year to publishers, according to The Information. There’s been some reporting on the deals with publishers such as Axel Springer, the Financial Times, NewsCorp, Condé Nast, and the AP. My back-of-the-envelope math based on publicly reported figures suggests that the ceiling on these deals is $10 million per publication per year.

On the one hand, this is peanuts, just embarrassingly small amounts of money. (The company’s former top researcher Ilya Sutskever made $1.9 million in 2016 alone.) On the other hand, OpenAI has already scraped all these publications’ data anyway. Unless and until it is prohibited by courts from doing so, it can just keep doing that. So what, exactly, is it paying for?

Maybe it’s API access, to make scraping easier and more current. As it stands, ChatGPT can’t answer up-to-the-moment queries; API access might change that. 

But these payments can be thought of, also, as a way of ensuring publishers don’t sue OpenAI for the stuff it’s already scraped. One major publication has already filed suit, and the fallout could be much more expensive for OpenAI. The legal wrangling will take years.

If OpenAI ingested the entirety of the text-based internet, that means a couple things. First, that there’s no way to generate that volume of data again anytime soon, so that may limit any further leaps in usefulness from ChatGPT. (OpenAI notably has not yet released GPT-5.) Second, that a lot of people are pissed.