OpenAI disputes authors’ claims that every ChatGPT response is a derivative work

  News
image_pdfimage_print
Sarah Silverman attends <em>The Bedwetter</em> book signing at the Barnes and Noble Union Square in New York City.”><figcaption class=
Enlarge / Sarah Silverman attends The Bedwetter book signing at the Barnes and Noble Union Square in New York City.

This week, OpenAI finally responded to a pair of nearly identical class-action lawsuits from book authors—including Sarah Silverman, Paul Tremblay, Mona Awad, Chris Golden, and Richard Kadrey—who earlier this summer alleged that ChatGPT was illegally trained on pirated copies of their books.

In OpenAI’s motion to dismiss (filed in both lawsuits), the company asked a US district court in California to toss all but one claim alleging direct copyright infringement, which OpenAI hopes to defeat at “a later stage of the case.”

The authors’ other claims—alleging vicarious copyright infringement, violation of the Digital Millennium Copyright Act (DMCA), unfair competition, negligence, and unjust enrichment—need to be “trimmed” from the lawsuits “so that these cases do not proceed to discovery and beyond with legally infirm theories of liability,” OpenAI argued.

OpenAI claimed that the authors “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

According to OpenAI, even if the authors’ books were a “tiny part” of ChatGPT’s massive data set, “the use of copyrighted materials by innovators in transformative ways does not violate copyright.” Unlike plagiarists who seek to directly profit off distributing copyrighted materials, OpenAI argued that its goal was “to teach its models to derive the rules underlying human language” to do things like help people “save time at work,” “make daily life easier,” or simply entertain themselves by typing prompts into ChatGPT.

The purpose of copyright law, OpenAI argued, is “to promote the Progress of Science and useful Arts” by protecting the way authors express ideas, but “not the underlying idea itself, facts embodied within the author’s articulated message, or other building blocks of creative,” which are arguably the elements of authors’ works that would be useful to ChatGPT’s training model. Citing a notable copyright case involving Google Books, OpenAI reminded the court that “while an author may register a copyright in her book, the ‘statistical information’ pertaining to ‘word frequencies, syntactic patterns, and thematic markers’ in that book are beyond the scope of copyright protection.”

“Under the resulting judicial precedent, it is not an infringement to create ‘wholesale cop[ies] of [a work] as a preliminary step’ to develop a new, non-infringing product, even if the new product competes with the original,” OpenAI wrote.

In particular, OpenAI hopes to convince the court that the authors’ vicarious copyright infringement claim—which alleges that every ChatGPT output represents a derivative work, “regardless of whether there are any similarities between the output and the training works”— is an “erroneous legal conclusion.”

The company’s motion to dismiss cited “a simple response to a question (e.g., ‘Yes’),” or responding with “the name of the President of the United States” or with “a paragraph describing the plot, themes, and significance of Homer’s The Iliad” as examples of why every single ChatGPT output cannot seriously be considered a derivative work under authors’ “legally infirm” theory.

“That is not how copyright law works,” OpenAI argued, while claiming that any ChatGPT outputs that do connect to authors’ works are similar to “book reports or reviews.”

Further, OpenAI argued that the authors have failed to show that the company has a “direct financial interest” in allegedly infringing the copyrights of their works.

“It is not enough that the challenged activity is carried out by users of tools offered for profit by a technology company: rather, to satisfy the ‘direct financial interest’ prong” of copyright infringement, the material that infringes the plaintiff’s works must ‘act as a draw for [defendant’s] customers’ such that there is a direct ‘causal link between the infringement of the plaintiff’s own copyrighted works and any profit to the [defendant],’” OpenAI wrote.

Neither OpenAI nor lawyers representing the authors suing from the Joseph Saveri Law Firm immediately responded to Ars’ request to comment.

https://arstechnica.com/?p=1964348