Two copyright questions that AI regulators should align on

This briefing highlights two questions at the intersection of artificial intelligence and copyright law which require a harmonised international response to avoid digital fragmentation.


Johannes Fritz, Danielle Koh


22 Aug 2023

Report Image

When the G7 leaders launched the Hiroshima process for artificial intelligence (AI), upholding copyrights and intellectual property already featured in the agreed work programme. This briefing highlights two questions at the intersection of artificial intelligence and copyright law which require a harmonised international response to avoid digital fragmentation. Evidence from the Digital Policy Alert shows how regulators across the globe are already moving, and how the private sector is demanding clear answers through pending lawsuits. 

On the one hand, the confluence of fair use—with AI systems harnessing copyrighted content for training, and thereafter, potentially producing imitative copyrighted creations without human oversight—necessitates clarity. On the other hand, we are forced to reckon with the notion of authorship and invention, questioning whether AI-crafted outputs deserve copyright or patent recognition. Businesses who develop or use generative AI systems confront not only legal ambiguities but also face potential vulnerabilities in operating across borders.

Educating imitators? Copyrighted works in training data

Though subtle, whether to include copyrighted works in training data touches two distinct issues that are currently working their way through legislatures and courts. The first issue is whether AI developers may include copyrighted works in their training data without explicit permission. The second, more subtle issue, is whether AI output created from such training data can infringe copyright protection.

At the heart of both issues lies the “fair use doctrine”. For illustration, under this doctrine, teachers may include copyrighted works in their instruction material without explicit permission from the authors and without paying a licence fee. They may not, however, sell this instruction material without explicit permission or reimbursing the rights holders. Also, the student who voraciously absorbed the copyrighted works cannot release a memorised copy of the original without permission from the original author. 

At present, generative AI intertwines these two issues. Copyrighted works are used in training data to teach the model about expected output quality. At the same time, generative AI is capable of reproducing the original work to a close approximation - with or without an explicit user request for such an imitation.

Rightsholders were early to react to these unresolved issues. Already in early 2023, Getty Images filed lawsuits in the UK and the US against Stability AI, provider of the image-generating AI system Stable Diffusion, for using its proprietary image stock as part of its training data without a proper licence. More recently, a coalition of authors around the US comedian Sarah Silverman has filed lawsuits against OpenAI and Meta for using their copyrighted works in training data. In a potentially relevant case from the pre-chatGPT era, the US Supreme Court recently ruled that Andy Warhol’s illustration of Prince based on Lynn Goldsmith’s photography was insufficiently transformative to fall under the fair use doctrine. Finally, the US Supreme Court’s 2021 ruling on Google v Oracle over the use of source code may receive new relevance for generative AI models trained on broad code bases to support developers worldwide.

From the regulatory side, the EU AI Act includes the requirement to disclose copyrighted works as part of your training data, though it remains to be seen how this requirement will be used. In an attempt to strike a delicate balance between the two issues, the government of Japan has recently clarified that, in principle, it deems the inclusion of copyright-protected works in training data as “fair use”. At the same time, Japan has stated that AI-generated art that uses another artist's work outside of the purpose of personal use (e.g., commercial or others) may be capable of copyright infringement. 

How much human input is needed for copyright protection?

Using generative AI systems in the creative process raises two questions about who gets to hold the copyright, if anyone. The first is whether the machine itself can be listed as an “inventor” on a patent or a “creator” on a creative copyright. Regulators around the globe appear to be aligning that only a human inventor can be listed as the creator of intellectual property. Regulators and courts in the European Union, the United Kingdom and the United States have recently confirmed that creators and inventors have to be “humans” or “individuals”. An early Australian court ruling originally took the opposite view but was subsequently overturned by Australia’s Federal Court and brought in line internationally. The Canadian and British governments started inquiries into copyright protection for computer-generated works in 2021, but have not shown any intent to change existing patent laws for now.

The second question is at what point a human can claim copyright protection for works created with the (heavy) use of an AI system. The US Copyright Office in March 2023 clarified in new guidance that it is in principle possible for authors to receive copyright protections for works created using an AI system. The Office stresses, however, that the work is the claiming author’s “own original mental conception, to which [the author] gave visible form”. Put differently, publishing machine-generated works cannot receive copyright unless the author provides sufficient value added through own interpretation and re-arrangement. While intuitively understandable, this is a line that may become increasingly blurred with more sophisticated prompting. Arguably, the article received in response to a simple prompt fails the US Office’s test. It remains unclear whether a sophisticated multi-stage prompt that attaches the author’s prior work as a “style guide” will require many alterations to qualify as their own work.

The need for international alignment

Failure to reach an international consensus on the inclusion of copyright protected materials in training data or the eligibility for copyright protection may have unintended fragmentary consequences. For AI developers, jurisdictions such as Japan may be more attractive for the design and training of generative AI systems that benefit from access to copyright materials. For all authors, whether or not their works can be turned into artificial competition may severely affect their livelihoods. For creators using AI tools, differing positions on the appropriate safeguards against copyright infringement may raise liability risks. For consumers of protected works in turn, it may mean that authors stop the local dissemination of their works over concerns about their inclusion into generative AI systems that compete with their output.