October 5, 2023

Good morning. In today’s either/view, we discuss the ongoing tussle between authors and OpenAI. We also look at the EWS reservation in judicial services in Bihar, among other news.

📰 FEATURE STORY

Authors versus OpenAI: Copyright Infringement or Fair Use?

Remember all the times in school or college that you have read a text for your English exam. You read the text, memorize some of it hopefully and give an answer when something is asked of it. Now imagine, all the authors from your school time coming back from the dead to sue you for copyright infringement.

This is what Artificial Intelligence (AI) promoters claim is happening with prominent authors suing OpenAI. The authors claim that ChatGPT has been trained on their copyrighted works without due compensation. A class action lawsuit has been filed against OpenAI in the matter in a San Francisco court opening a plethora of questions.

Do the allegations of these thousands of prominent authors hold weight? How do we deal with copyright issues in the realm of generative AI? All answers, as of now, depend on where the question is being asked.

Context

Generative AI has had an amazing year. Corporations like Adobe, Microsoft, GitHub and several upcomers in the startup space are incorporating AI into their products. However, the legal fraternity is split on whether any of this entire venture is legal.

Mona Awad and Paul Tremblay filed a class action complaint in a San Francisco federal court in July this year alleging that their books, which are copyrighted, were used to train ChatGPT because the chatbot generated ‘’very accurate summaries’’ of their works. The complaint says that OpenAI unfairly profits from stolen writing and ideas and calls for monetary damages on behalf of all US-based authors.

In September last month, novelist Michael Chabon, playwright David Henry Hwang, and authors Matthew Klam, Rachel Louise Snyder, and Ayelet Waldman filed another lawsuit along similar lines adding that ChatGPT can also generate texts that mimic their styles. An order is requested blocking OpenAI’s “unlawful and unfair business practices” while OpenAI has responded with a motion to dismiss these claims.

The model underlying ChatGPT has been trained with data that is publicly available on the internet. Like most machine learning software, it works by identifying and replicating patterns in this data. However, this data is itself created by humans, scraped from the web and is copyright protected in one way or another. Books are considered ideal for training large language models because they tend to contain “high-quality, well-edited, long-form prose“ and have been called the gold standard of idea storage for our species.

In light of such value for Machine Learning (ML) modelling, the Society of Authors published a list of practical steps for members to safeguard themselves and their work against AI in June. The Author’s Guild of America has also been lobbying aggressively for legislation that would clarify that permission is required to use books, articles, and other copyright-protected work in generative AI systems.

The concept of Copyright was conceived in an era when writing was done solely by human beings. This means the fundamental concepts within the law from infringement to exceptions are human-centric. Our current scant regulation around AI is fragmented and inconsistent across different jurisdictions and struggles to keep pace with technological developments. These lawsuits are going to explore the uncertain borders of legality within the AI space.

It is expected the case will likely rest on whether courts view the use of copyright material in this way as ‘fair use’ or as simple unauthorised copying. In this context, the location of the lawsuit also becomes important as, unlike the US, countries like the UK and Australia do not consider fair use a valid defence against copyright.

The question of whether to prioritise AI advancement or the protection of copyright is likely to continue as we make further advancements. Legal experts believe that these particular suits against OpenAI are likely to fail but these will just be the first salvo in a major AI-driven groundshift in copyright.

VIEW: Protect the writer’s rights

Since several writers have been alleging that OpenAI has been “unlawfully ingesting” their books, OpenAI too has become increasingly secretive about its training data undermining confidence. The books that OpenAI has been trained on comes from the ‘Books2’ dataset consisting of nearly 3,00,000 titles. It is believed that these titles have been drawn from shadow libraries such as Library Genesis (LibGen) and Z-Library, tainting the source of this data.

ChatGPT is also believed to be the first generation of generative AI technology. As per Moore’s Law calculations, the capacity of digital technology doubles roughly every two years. With the potential of exponential growth, several authors and creative producers have started to wonder what happens to their monetization and sustenance when OpenAI evolves.

As for the legal standing, in August this year, a US district court judge ruled that artwork generated by AI cannot be copyrighted, arguing that copyright has never been granted to work that was absent any guiding human hand and that human beings are an essential part of a valid copyright claim. Such precedence might indicate that AI’s monetization of ripped-off content cannot be justified.

The legal fraternity has also noted the irony in artificial intelligence tools relying on data made by humans. These systems depend entirely on human creativity and if they bankrupt human creators, they will soon bankrupt themselves.

It is essential to protect the true value that human authorship brings to our lives and, in certain cases, our economy, and in a globalizing world, our international identity.

COUNTERVIEW: Prioritise AI development

OpenAI claims that the pleading authors have misconceived the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of AI. Copyright protection also does not extend to ideas and though copying something to a database might be an act of infringement, that act alone is unlikely to cause significant harm to the economic interests of the authors.

Furthermore, it may be difficult to prove that authors have suffered financial losses specifically because of ChatGPT being trained on copyrighted material, even if their claims are true. It is also claimed that ChatGPT might have worked “exactly the same” even if it had not ingested the books because it is trained on internet information that includes internet users discussing the contested books.

In the legal realm, several governments have been keen on promoting an exception to copyright that would allow free use of copyright material for text and data mining, even for commercial purposes. In September, the US Copyright Office granted a first-of-its-kind registration for a comic book generated with the help of text-to-image AI Midjourney. The UK is also one of only a handful of nations currently which have started offering copyright for works generated solely by a computer.

As for the question of fair use, the determining question is what’s the purpose of the use and what’s its impact on the market (or the original creator’s livelihood). While it might be perfectly legal to train AI models using other people’s data, what the model is used to perform might be infringing of copyright.

In a recent case against the plagiarism detector TurnItIn.com, it was held that works could be ingested for a greater common good such as to create a database used to expose plagiarism. When training ML models on these extremely large datasets, it is not plausible that all the used data be licensed. Thus, this “fair learning” must be permitted to not only encourage innovation but also to allow for the development of better AI systems.

Reference Links:

Authors file a lawsuit against OpenAI for unlawfully ‘ingesting’ their books – The Guardian
More writers sue OpenAI for copyright infringement over AI training – The Hindu
Two authors are suing OpenAI for training ChatGPT with their books. Could they win? – Scroll
Authors Join the Brewing Legal Battle Over AI – Publishers Weekly
The scary truth about AI copyright is nobody knows what will happen next- The Verge

What is your opinion on this?
(Only subscribers can participate in polls)

a) Authors should get fair compensation from companies like OpenAI.

b) AI development should qualify under fair use exemption from Copyright.

🕵️ BEYOND ECHO CHAMBERS

For the Right:

As India’s external intelligence operations expand globally, a growing army of spies left out in the cold

For the Left:

Empowering OBCs: Prime Minister Modi walks the talk

🇮🇳 STATE OF THE STATES

Canal controversy (Punjab) – The Supreme Court reprimanded the Punjab government for not adhering to a 21-year-old directive to construct its portion of the canal linking the Sutlej and Yamuna rivers. The court warned the ruling Aam Aadmi Party to comply or face further action. The court also directed the central government to mediate discussions between Punjab and Haryana, with the latter having already completed its half of the canal construction.

Why it matters: The canal dispute has significant implications for water-sharing between Punjab and Haryana, two major states in India. The delay in construction has been attributed to political pressures and challenges in acquiring land from farmers. The canal’s construction is not just an infrastructural issue but also carries potential political and social ramifications, especially given the strong opposition from Punjab’s leadership.

OotyLitFest Returns (Tamil Nadu) – The 7th edition of the Ooty Literary Festival (OotyLitFest 2023) is scheduled on October 6-7, celebrating Ooty’s bicentennial and highlighting the ecological significance of the Nilgiri Biosphere. The festival will host discussions, exhibitions, and concerts with notable participants including Aamir Khan and Javed Akhtar. The Lifetime Achievement Award will be conferred to Thiru. Perumal Murugan for his literary contributions in Tamil Nadu.

Why it matters: This festival is crucial as it fosters cultural understanding and conservation, acting as a vital platform for Tamil Nadu’s literary promotion. Honouring Ooty’s bicentennial, it reflects on the region’s rich heritage while envisaging a future that values ecological responsibility and cultural diversity.

CM’s apology (Rajasthan) – Chief Minister Ashok Gehlot has tendered an “unconditional apology” to the Rajasthan High Court for his remarks suggesting widespread corruption in the judiciary. These comments had sparked outrage among the legal community. In his defence, Gehlot stated that his comments were distorted by a publication and emphasized that he has never personally witnessed corruption within the judiciary. The matter is scheduled for a hearing on November 7.

Why it matters: The CM’s remarks on the judiciary’s corruption have significant implications, as they challenge the integrity of a key pillar of democracy. The backlash from the legal community and the subsequent apology highlights the sensitivity of such statements, especially when made by a person holding a high office. The outcome of the hearing could set a precedent for public figures commenting on institutional integrity.

EWS reservation in judicial services (Bihar) – The state government has declared a 10% reservation for the Economically Weaker Sections (EWS) in both the judicial services and government-run law colleges and universities. This announcement was made a day after the government unveiled the initial data from Bihar’s caste survey. The decision was finalized in a cabinet meeting presided over by Chief Minister Nitish Kumar. The state will soon issue a detailed notification about this new policy.

Why it matters: The EWS reservation in the judicial sector and educational institutions signifies a significant step towards ensuring equal opportunities for economically disadvantaged groups. This move, closely following the release of the caste survey data, reflects the government’s commitment to addressing socio-economic disparities. The decision could set a precedent for other states to follow, ensuring broader representation in key sectors.

Megha Start-up Expo (Meghalaya) – The first-ever Megha Start-Up Expo is taking place at the NEHU campus in Tura, West Garo Hills district, concluding on October 5. Organized by BIRAC’S BioNEST Bioincubator and other partners, the expo provides a platform for start-ups, agripreneurs, and innovators to showcase their offerings. Education Minister Rakkam A Sangma, during the inaugural session, commended NEHU’s efforts in promoting entrepreneurship.

Why it matters: The Megha Start-Up Expo represents a significant initiative to boost entrepreneurship in the region, especially in the agricultural and rural sectors. By providing a platform for innovators to connect with potential investors, buyers, and partners, the event can catalyze economic growth and innovation in the state.

🔢 KEY NUMBER

$48.27 million – Superdry, the UK’s struggling fashion retailer, is set to enter a joint venture with Mukesh Ambani’s Reliance Retail, involving the sale of its intellectual property assets to Reliance Retail for $48.27 million.

📰 FEATURE STORY

Authors versus OpenAI: Copyright Infringement or Fair Use?

🕵️ BEYOND ECHO CHAMBERS

🇮🇳 STATE OF THE STATES

🔢 KEY NUMBER

Join our community of informed subscribers