Un éditeur universitaire a conclu un accord sur les données d’IA avec Microsoft – à l’insu de ses auteurs

Un éditeur universitaire a conclu un accord sur les données d'IA avec Microsoft – à l'insu de ses auteurs

By Wellett Potter, University of New England

In May, British multinational Informa announced in a business update that it had signed an agreement with Microsoft. This deal involves « access to advanced learning content and data, as well as a partnership to explore expert AI applications. » Informa, the parent company of Taylor & Francis, publishes a wide array of academic and technical books and journals, suggesting that the data in question may include content from these publications.

According to reports published last week, the authors of this content were neither consulted nor informed about the agreement. They claim they were not given the option to opt out and will not benefit financially from the deal.

Academics are the latest group of content creators to express outrage over their work being used by generative AI models, which are rapidly consuming human cultural products. Newspapers, visual artists, and record labels are already suing AI companies.

While it remains unclear how Informa will respond to the growing discontent, the agreement serves as a reminder for authors to be mindful of the contractual terms they agree to when signing publishing deals.

What’s in the Informa Agreement?

Informa’s update outlined four key areas of its agreement with Microsoft:

  • Enhancing Informa’s own productivity
  • Developing an automated citation tool
  • Creating AI-powered research assistance software (similar to a system tested by online academic library JSTOR)
  • Providing Microsoft with data access to « help improve the relevance and performance of AI systems. »

Informa will receive over £8 million (AU$15.5 million) for initial data access, followed by unspecified recurring payments over the next three years.

We do not know exactly what Microsoft plans to do with this data, but a likely scenario is that the content from academic books and articles will be added to the training data for generative AI models like ChatGPT. In theory, this should make AI systems’ outputs more accurate, although existing AI models have faced criticism for regurgitating training data without citation (which can be seen as a form of plagiarism), inventing false information, and misattributing sources.

However, the update also states that « the agreement protects intellectual property rights, including limits on text excerpts and alignment with the importance of detailed citation references. »

The « limits on text excerpts » likely refer to the U.S. doctrine of fair use, which allows certain uses of copyrighted material.

Many generative AI companies are currently facing lawsuits for copyright infringement over their use of training data, and their defenses will likely hinge on claims of fair use.

The « importance of detailed citation references » may relate to the concept of attribution in copyright law. This is a moral right held by authors, stipulating that the creator must be acknowledged as the author when their work is reproduced.

How Does Academic Publishing Usually Work?

Most academics receive no compensation and make no profit from the majority of their scientific publications. Writing journal articles and conference papers is generally considered part of the job for a full-time permanent position. Publishing enhances an academic’s credibility and promotes their research.

The typical process involves an author researching and writing an original article, then submitting it to a journal editor for peer review. Most peer reviewers and editorial board members also receive no compensation for their work.

In fact, some journals may require authors to pay an article processing fee to cover publishing costs, which can amount to thousands of dollars for an open-access publication. Generally, the more prestigious the publication, the higher the fee.

If an article passes peer review, the author will be asked to sign a publishing agreement. Terms may cover logistical arrangements such as publication timing, format (print, online, or both), and royalty distribution (if any). There will also be arrangements regarding copyright and ownership of the article.

An author typically must also grant exclusive rights to the publisher to distribute and publish the article. This may mean the author cannot publish the article elsewhere, and the publisher may also be able to sublicense the article to a third party, such as an AI company.

Sometimes, publishers require an author to transfer copyright to them through a permanent copyright transfer agreement.

Essentially, this means the author grants the publisher all their copyright rights as the copyright holder of the work. The publisher can then reproduce, communicate, distribute, or license the work to third parties at their discretion.

It is possible to assign only limited rights rather than all rights, and this is something authors should consider.

Content Mining

It is crucial for authors to understand the implications of licensing and assignment and to think carefully about what they agree to when signing a contract. In light of the recent trend of publishers making deals with generative AI companies, publishers’ AI policies should also be closely scrutinized.

In the U.S., a standard collective licensing solution for the use of content in internal AI systems was recently released, defining the rights and remuneration of copyright holders. Similar licenses for content use in AI systems are likely to emerge soon in the Australian market.

The types of agreements between academic publishers and AI companies have raised broader concerns among many academics. Do we want scientific research to be reduced to content for AI knowledge mining? There are no clear answers regarding the ethics and morality of such practices.

Wellett Potter, Lecturer in Law, University of New England

This article is republished from The Conversation under a Creative Commons license. Read the original article.



TheConversation-logo-150x150 Un éditeur universitaire a conclu un accord sur les données d'IA avec Microsoft – à l'insu de ses auteurs NEWS

The Conversation is an independent source of news and views, sourced from the academic and research community and delivered directly to the public.

TheConversation-logo-150x150 Un éditeur universitaire a conclu un accord sur les données d'IA avec Microsoft – à l'insu de ses auteurs NEWS

The Conversation is an independent source of news and views, sourced from the academic and research community and delivered directly to the public.

Source