Generative AI has had a very good year. Corporations like Microsoft, Adobe, and GitHub are integrating the tech into their products; startups are raising hundreds of millions to compete with them; and the software even has cultural clout, with text-to-image AI models spawning countless memes. But listen in on any industry discussion about generative AI, and you’ll hear, in the background, a question whispered by advocates and critics alike in increasingly concerned tones: is any of this actually legal?
The question arises because of the way generative AI systems are trained. Like most machine learning software, they work by identifying and replicating patterns in data. But because these programs are used to generate code, text, music, and art, that data is itself created by humans, scraped from the web and copyright protected in one way or another.
For AI researchers in the far-flung misty past (aka the 2010s), this wasn’t much of an issue. At the time, state-of-the-art models were only capable of generating blurry, fingernail-sized black-and-white images of faces. This wasn’t an obvious threat to humans. But in the year 2022, when a lone amateur can use software like Stable Diffusion to copy an artist’s style in a matter of hours or when companies are selling AI-generated prints and social media filters that are explicit knock-offs of living designers, questions of legality and ethics have become much more pressing.
Generative AI models are trained on copyright-protected data — is that legal?
Take the case of Hollie Mengert, a Disney illustrator who found that her art style had been cloned as an AI experiment by a mechanical engineering student in Canada. The student downloaded 32 of Mengert’s pieces and took a few hours to train a machine learning model that could reproduce her style. As Menger told technologist Andy Baio, who reported the case: “For me, personally, it feels like someone’s taking work that I’ve done, you know, things that I’ve learned — I’ve been a working artist since I graduated art school in 2011 — and is using it to create art that that [sic] I didn’t consent to and didn’t give permission for.”
But is that fair? And can Mengert do anything about it?
To answer these questions and understand the legal landscape surrounding generative AI, The Verge spoke to a range of experts, including lawyers, analysts, and employees at AI startups. Some said with confidence that these systems were certainly capable of infringing copyright and could face serious legal challenges in the near future. Others suggested, equally confident, that the opposite was true: that everything currently happening in the field of generative AI is legally above board and any lawsuits are doomed to fail.
“I see people on both sides of this extremely confident in their positions, but the reality is nobody knows,” Baio, who’s been following the generative AI scene closely, told The Verge. “And anyone who says they know confidently how this will play out in court is wrong.”
Andres Guadamuz, an academic specializing in AI and intellectual property law at the UK’s University of Sussex, suggested that while there were many unknowns, there were also just a few key questions from which the topic’s many uncertainties unfold. First, can you copyright the output of a generative AI model, and if so, who owns it? Second, if you own the copyright to the input used to train an AI, does that give you any legal claim over the model or the content it creates? Once these questions are answered, an even larger one emerges: how do you deal with the fallout of this technology? What kind of legal restraints could — or should — be put in place on data collection? And can there be peace between the people building these systems and those whose data is needed to create them?