Snapshot
- Tools like ChatGPT, Midjourney and Sora are reshaping creative industries by generating art, text and other forms of media based on human-made content, often without compensation or consent.
- Major lawsuits challenging AI companies who scrape data and train their products with pirated material are playing out across the world, with courts divided on the applicability of traditional copyright doctrines.
- This article outlines recent developments in those legal battles for human creativity and foreshadows the consequences of unchecked AI growth: the erosion of the value of human artistry and a culturally cheap, homogenised future
If you want to understand what AI might do for humanity, I recommend reading sci-fi novels from the 1950s. There you’ll find utopian worlds where robots do the boring work and humans are left to a life of leisure, pleasure and creativity. Unfortunately, that life has not come to fruition. Instead, we are facing a world where humans do the boring work, while robots write poetry, make video games and paint pictures of cats.
Welcome to the new world of AI. Launched in 2022, ChatGPT was the first AI system to seriously pose a threat to human creativity. Since then, a range of other AI systems have gained prominence, threatening to supplant human art altogether. From image generator, Midjourney, to film generator, Sora, each makes the seductive claim that anyone can be an artist. But, each of these systems relies on the art of real humans, past and living. As these systems grow, the claim that anyone can be an artist is quickly turning into a promise that no one can be an artist for pay
The artists of our world are, understandably, not happy and many are hiring lawyers to sue for copyright infringement. These lawsuits have some of the highest stakes in human history. If humans lose, we will face a world of AI-generated culture where fashion, television, film and social trends are decided for us by an artificial entity. The value of creativity will dissipate and any incentive for a human artist to create their own art will collapse in a world where copyright becomes functionally meaningless. Until now, copying and distributing work would result in a license, royalty fee or fine. The AI companies want a world where content is free.
To understand the context, I must briefly explain how AI works. Every AI is composed of two elements: inputs and outputs. For inputs, AI developers collect images, text, video and other data from the internet. This data is then painstakingly labelled by humans, often in developing countries for low wages. The labelled data is then fed into a computer running a machine learning algorithm. This algorithm crunches the data and, through a guided learning process facilitated by humans, learns to identify it, calling a bird a bird or a fish a fish, for example. Finally, the AI recombines this data to form an output. This could be an image, text or video generated for the user.
This article would not exist if AI companies limited themselves to public domain data. But the AI industry is a bit like the Cookie Monster. One cookie is never enough. The more images, films and texts an AI ingests, the more capacity it has to create something new. As a result, AI companies are always looking for new sources of data to scrape from the internet. Some believe the only way to improve an AI is to feed it more data. Others insist a breakthrough might move us beyond this paradigm and there are hints this may occur.
However, given conventional wisdom, AI companies have quickly moved beyond public domain material. Allegedly, they are scraping data from the entire internet. This includes popular social media sites. Now, they are also going after private data: internal company documents and private messaging. Over time, they are increasingly making use of protected and copyrighted information.
The legal battle
The major legal questions in this area are likewise divided into inputs and outputs. On the inputs side, the question is whether scraping data from the internet and feeding it to an AI system is a breach of copyright. This is more complicated than it may at first seem, as some content scraped may be behind a paywall, not publicly available or obtained illegally through pirating. On the outputs side, the question is whether an AI output that looks like a copyright-protected work is an infringement of that work. For example, if I generate a picture of Shrek on Midjourney, has Midjourney broken the law? Have I broken it? What if it only vaguely looks like Shrek?
As these systems grow, the claim that anyone can be an artist is quickly turning into a promise that no one can be an artist for pay.
Another legal question is about harm and damage. Artists will often have to prove they have suffered financial harm due to the AI company. This is more difficult than it seems. Although online forums document copywriters, advertisers, illustrators and other artists universally lamenting how AI has led to a decline in their field, this is not proof of individual harm. The courts may wish artists to prove that a particular AI system is in direct competition with them, or else to prove this systemic harm has impacted the artist directly.
A range of lawsuits have taken up these questions in the past two years. These range from major companies like Disney and Getty Images bringing actions, to smaller class action lawsuits for an array of artists, such as groups of writers. While going through some of these cases, I will attempt to explain some of the technical elements and the legal elements in each. This is not meant to be an exhaustive account of the field and, indeed, most of these cases are ongoing. I have mainly focused on US case law, as that is where many of these companies operate. This is a limitation of my piece, as some of the companies are headquartered in Ireland or have data centres in other countries.
Furthermore, international jurisdictions such as the EU may in future provide more stringent case law and/or legislation. The EU, with its Digital Services Act (2022), Digital Markets Act (2022) and AI Act (2024), answered a range of parallel questions about the use of AI in Europe, preventing and negating certain harms and uses of personal data. They have generally taken a tougher approach to enforcement and protection of citizens, as compared to the laissez-faire approach of the US, UK and Australia, where tech companies generally have free reign to conduct social experiments on users with new products. I have argued elsewhere that Australia needs an independent AI Safety Institute, and the discussion below further substantiates this necessity.
New York Times v OpenAI
The New York Times (‘Times’) was one of the first major companies to sue for copyright infringement by AI back in 2023, the early days of ChatGPT. The Times sued both OpenAI, ChatGPT’s creators, and Microsoft, their business partner and creators of the Bing Copilot AI system, which helps to answer search queries. The Times alleges OpenAI scraped millions of articles from the Times’ archives and that this was done without payment, attribution or license.
The Times also alleges that, when prompted by users, ChatGPT can spit out verbatim copies of Times articles, or else share major findings of investigative journalism from its authors. ChatGPT sometimes may also ‘hallucinate’; that is, make up false articles and falsely attribute these to Times journalists. These claims represent transgressions of US copyright law and, if true, would undoubtedly damage the business model of the Times which relies upon subscription fees, ad revenue and licensing agreements.
Microsoft is implicated via its Bing Copilot feature, which summarises answers from the internet for users. These summaries often contain far more quotes or content from an article than a traditional search result. This means users do not need to ‘click through’ to read the whole article on the Times website. This again challenges the Times’ business model because it is reliant on users paying to view full content through a paywall system.
The case essentially comes down to three different claims. Firstly, the Times claims the actual scraping of data from the Times was an infringement, as copies were likely made in this process. Secondly, ChatGPT and Copilot, after training, are themselves derivatives of the Times’ body of work. Finally, the outputs generated by the AI systems are an infringement as they reproduce copyrighted materials. In the latter argument, the Times provided side-by-side comparisons that were word-for-word the same as the originals.
The Times provided side-by-side comparisons that were word-for-word the same as the originals.
Furthermore, many of the articles in the Times claims were located behind a paywall. It appears that OpenAI either used one account to access the articles and scraped the content while doing so or pirated the materials in some other way. If this is the case, then they not only breached copyright law, but potentially breached the contract a user agrees to when signing up to view articles.
This case is ongoing, with the most recent outcome being an interlocutory decision in April in which Judge Stein dismissed and granted a number of ancillary motions. The key claims remain afoot and the stage is set for further discovery and potentially a trial on the copyright issues.
Authors v Anthropic
Two major US decisions on AI and copyright were handed down this June. The first relates to a group of authors who sued Anthropic, the AI developer behind the popular large language model (‘LLM’), Claude. The authors allege Anthropic pirated their books by scanning them to train its AI system, that Anthropic scanned lawfully purchased books into a new digital format and also that Anthropic retained a permanent digital library of pirated books.
In this decision, Judge Alsup of the US District Court for the Northern District of California decided the use of legally purchased books to train an AI model does not violate US copyright law. He rejected the arguments that Anthropic’s use of the works was not fair use, finding ‘the training use was a fair use’ (at 31). He also found ‘The use of the books at issue to train Claude and its precursors was exceedingly transformative’ (at 9).
Judge Alsup went on to say that the digitisation of books purchased by Anthropic could also be considered fair use ‘because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library, without adding new copies, creating new works, or redistributing existing copies’ (at 9).
However, Judge Alsup noted many of the books were not paid for and so not legally purchased. He found Anthropic ‘downloaded for free millions of copyrighted books in digital form from pirate sites on the internet’ as part of its effort ‘to amass a central library of “all the books in the world” to retain “forever”’ (at 1). These pirated books were not obtained legally and, therefore, he ruled that authors would be able to claim damages for compensation for their usage by the AI developer.
This will result in damages in the case in question but it will also open up the possibility for class action lawsuits over the 7 million books downloaded illegally by Anthropic. With damages in the thousands for each book, this means Anthropic could be liable for billions of dollars in damages if the authors’ collective claims are successful.
However, this is not a fatal blow to the AI business model. If an AI company can purchase a book just once, train its AI on it and distribute the trained model to millions, if not billions of consumers, then the cost of buying a book once is negligible compared to the profit they can make off its usage. Unsurprisingly, Anthropic celebrated this decision, stating to the press: ‘We are pleased that the court recognised that using works to train LLMs was transformative – spectacularly so’.
In siding with Anthropic, Judge Alsup’s ruling that the AI outputs were ‘transformative’ stoked no small amount of controversy. He noted:
‘Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different’ (at 13-14).
This case can likely be distinguished from the Times case, however, in that here the authors could not provide word-for-word replications of their materials.
This will result in damages in the case in question but it will also open up the possibility for class action lawsuits over the 7 million books downloaded illegally by Anthropic.
Authors v Meta
In another major case in June and the same jurisdiction, another group of authors sued Meta for copyright infringement over the use of their 13 books to train its AI system. The case included high-profile names such as comedian Sarah Silverman and writer Ta-Nehisi Coates. They alleged Meta broke copyright law when training its AI model and the training resulted in a direct duplication of their works, rather than something transformative. They argued Meta’s AI caused a ‘market dilution’ of their works, leading to financial harm.
Judge Chhabria found the plaintiffs did not prove sufficient financial harm in the case:
‘[T]he key question in virtually any case where a defendant has copied someone’s original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original work’ (at 6-7).
Instead, he found Meta’s AI could use the author’s works under the fair use doctrine. In a silver lining for the authors, the Judge noted that scraping text from copyrighted works for AI without permission might be illegal in ‘many circumstances’ (at 4) and disclaimed that ‘this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful’ (at 5).
Judge Chhabria further distinguished his stance from Judge Alsup’s by stressing that the latter was ‘brushing aside’ the importance of market harm in his fair use ruling by focusing on whether the use of the work was transformative (at 3). Ultimately, due to the two conflicting decisions, we may have to wait for the results of a US Supreme Court decision before we have a final sense of clarity over whether AI inputs via scraping are legal or not.
Getty Images v Stability AI
In July this year, another major case reached its zenith in London. Getty Images, famous for its stock image archive, took Stability AI to the UK High Court for copyright infringement over its Stable Diffusion image generator. The suit alleges Stability AI scraped the Getty Images website illegally to train Stable Diffusion, that it then copied the materials it scraped and the outputs of the AI also directly reproduce Getty content. For proof, they attached side-by-side comparisons of Getty Image stock photos with their white watermark, alongside AI-generated images featuring blurry ‘Getty Images’ watermarks. A discernible watermark—a distinct signature that remains present in the AI-generated output—is likely the closest thing to a smoking gun in such cases.
Unfortunately, due to the case proceeding in the UK jurisdiction, there were some challenges regarding jurisdictional issues. Prior to reaching the High Court, Getty Images was forced to drop its main allegations of primary copyright infringement as they could not prove direct copying and infringement occurred in the UK. They are, however, continuing to pursue Stability AI for secondary infringements which can include importing, marketing, selling and distributing copyrighted materials.
Originally, Getty Images’ main arguments concerned both inputs and outputs. They claimed the scraping of content was itself illegal and the generation of output content by third-party users of Stable Diffusion was also a breach. Finally, they argued Stable Diffusion was an infringing copy of Getty Images’ works, an ‘article’ within the UK definition; in other words, that the AI itself is a copy of their works.
They claim everyone can be an artist but, in reality, they are bankrupting entire creative fields through creative legal technicalities.
There are some commercial reasons behind Getty Images’ case that go beyond the financial harm of the AI models. Getty Images has actually launched its own AI image generator, trained on its vast library of stock images licensed from human creators. They are marketing this as an ethical form of AI, where human artists have been paid and licensed for their content before it was fed into an algorithm. In many ways, this should have been the default way of creating an image generator, were a company to earnestly want to comply with copyright law.
Disney and Universal v Midjourney
On June 12, Disney and Universal Studios became the latest to sue for copyright infringement, this time implicating Midjourney, the popular AI image generator. In the suit, Disney claims Midjourney was illegally trained on Disney content. The AI generator, once trained, was then able to immediately replicate Disney content, including iconic characters such as Darth Vader and Shrek. In the case, they allege this poses significant financial harm to their media empire which relies on licensed characters to sell television shows, films and merchandising such as toys.
The filing includes side-by-side comparisons of original movie stills and AI-generated images of the famous characters. The suit alleges these outputs are also in breach of copyright as Midjourney has not paid to license or acquire these products. Lawyers for Disney use stark language in the case, claiming Midjourney’s AI is a ‘bottomless pit of plagiarism’ (at [2]).
Midjourney is currently estimated to be worth over 10 billion USD, with a revenue of 500 million in 2025. The Founder of Midjourney, David Holz, previously stated in a Forbes interview, in words that may now come back to haunt him: ‘It’s just a big scrape of the internet… We weren’t picky’. In the same interview, when asked if Midjourney sought artist consent to use their works, he was equally unequivocal: ‘No. There isn’t really a way to get a hundred million images and know where they’re coming from’.
Conclusion
If you can penetrate the techno-jargon, you’ll find AI companies full of promises about the future: scalable solutions, frictionless integration and disruptive transformation. But, this buzzword soup helps to obscure an ominous horizon where there is an unprecedented concentration of power in a handful of companies based in the US and China aiming to monopolise creativity. These companies do not care about the industries they are disrupting and, while they offer many products for free now, we can expect large subscription fees in the years to come once monopolisation occurs. They claim everyone can be an artist but, in reality, they are bankrupting entire creative fields through creative legal technicalities.
The future we face is not the one we were promised growing up. If the AI industry’s current practices continue, we can expect human culture to be supplanted by bland regurgitations of the artists of the past; an art world made by faceless machines and powerful men through the financial and cultural impoverishment of others. Without effective copyright law, trademarks and other forms of intellectual property the artists of the world will lose the incentive and ability to create art. The market for human creative expression will wither and be superseded by a new market, cheaper but soulless.
The stakes cannot be any higher and yet everywhere the losses are accumulating on the human side of the scale. It appears copyright law has not been adequately prepared to deal with this new technology and, while AI companies use this to their advantage, it is to the disadvantage of all of us and, in particular, all who hope of one day writing that novel, painting that portrait or composing that song. For now, we watch and wait.

