The Yin and Yang of AI – New York Times vs. OpenAI

The Taoist symbol of the “Yin and The Yang” shows two interdependent shapes, each of which contains the seed of its own opposite. Think Darth Vader and Luke Skywalker. Another example of the principle is the argument currently raging over Artificial Intelligence (AI). AI has the potential to do both enormous good and overwhelming evil. Which will it be? 

Just this week, we read that AI will benefit the US economy by over $650bn by 2030 in important activities like healthcare and cybersecurity for instance. We are talking of sustainable, societal changes for the greater good. On the other hand, many people fear that humankind will not even make it to 2030 if we let OpenAI and their allies do their thing! So, who is right?  

Culture and Copyright Law

Sweeping new ideas and technologies frequently challenge our culture and our laws, and in this case, we mean the laws of copyright. Copyright law goes all the way back to the Constitution. There, the Founders stated their belief that granting exclusive rights to writers, for a limited time, over their creative works would promote the progress of science and the development of new knowledge for all. Note the term “limited time,” and also the balance implied between the needs of the writer and the needs of fostering the advancement of the people – the balance between the rights of the individual and the priority of nation building.  

On December 27th, 2023, the New York Times filed a lawsuit against OpenAI and Microsoft that perfectly sums up this question of balance, and the Yin and Yang of the struggle. So, we thought it would be interesting to summarize a layperson’s view of the fight, and even pick a winner. We are not lawyers, just an informed, interested party.  

In summary, the Times claims that OpenAI infringed its copyright by using their (the Times’) creative works and intellectual property to train OpenAI’s Large Language Models (LLM) without permission, and thus built a competitive product that deprives the Times of revenues.  

Fair Use 

OpenAI responded that they relied on the legal principle of “Fair Use,” which permits limited use of the content without the owner’s consent under certain circumstances. Specifically, they claim that copyright law does not prevent training their AI models with such content. They also claim that access to the Times’ intellectual Property (IP) is essential for training effective LLMs, which seems to us a bit of a circular argument. It is like wanting to burn my house down so that you can improve your arson technique. 

The legal definition of Fair Use is unfortunately (and maybe deliberately) vague. It says that it is within the law to use unauthorized material: 

  • if the purpose is for commentary, critique, reporting, teaching, scholarship, or research 
  • if the use is “transformative” (i.e., it serves a different purpose from the original) 
  • if it balances the interests of the copyright holder with interests of the public at large 
  • However, and significantly, there are no rules specifying the limit to the number of words that may be reproduced. 

Facts and Figures

One precedent to OpenAI’s position is the 2015 decision for Google, allowing them to scan millions of copyrighted books to create their now ubiquitous search engine. There are, however, differences between that judgment and Open AI’s case. Google won the verdict by arguing that the use was “transformative” because the output was for a significantly different purpose than the original (i.e., a search that produced snippets from the original work, and not the whole work). They also successfully argued that the output from the search gave the users facts (that are not subject to copyright), rather than creative constructs (that are). Supporters of OpenAI’s position argue they are doing exactly what Google does. 

Opponents of OpenAI’s position point to the many examples presented by the Times of ChatGPT (OpenAI’s product) producing long, exact, word-for-word extracts from the Times reporting. OpenAI claims that such wholesale reproduction of large chunks of the original is a “bug” that they are working to eliminate. In a recent filing, OpenAI claims that the Times caused it to disgorge large chunks through “deceptive prompts” that violate their terms of use. Thrust and counterthrust! 

Times supporters are encouraged by the successful litigation against Napster and MP3.com in 2001, which established the principle that it was fine for an individual to buy a CD and then rip it to MP3 format for personal use, but it is an entirely different matter for a company to buy thousands of CDs and rip them to a server for commercial sale. They say it is a question of scale and purpose. 

Both sides have their points

OpenAI and Microsoft may well convince the judges that generative AI is a transformative use. It will certainly improve their case if they can eliminate the verbatim reproduction of large sections of the original creative writing. OpenAI’s management has expressed the conciliatory opinion that if an AI system is using your style or your content, then you are entitled to be paid for that. 

It is likely that the Times’ objective is not simply to stop OpenAI’s activities, but more to start them paying for “training licenses,” which may prove to be an ideal solution to the whole question. When Google started scanning thousands of copyrighted books, it was a totally new application that tested the laws of the time, and yet provided a valuable service to the world, including the owners of the IP; which brings us to what is possibly the decisive point. 

Judges are required to consider, and have often been swayed in the past, by what is best for the nation at large. Given that, they may not want to inhibit a burgeoning and potentially colossal new industry, and one in which the USA appears to have a significant lead. They may well decide that the AI applications in question create a valuable service for all, and not just a profit for a small handful of tech companies, piggybacking on the creative efforts of others. Big stakes are involved, and not just for the litigants. 

 

 Image by Freepik.