In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books.
But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online ''shadow libraries'' of pirated copies.
U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as ''fair use'' under U.S. copyright law because it was ''quintessentially transformative.''
''Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,'' Alsup wrote.
But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works.
''Anthropic had no entitlement to use pirated copies for its central library,'' Alsup wrote.
A trio of writers — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — alleged in their lawsuit last summer that Anthropic's practices amounted to ''large-scale theft," and that the San Francisco-based company ''seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.''
Books are known to be important sources of the data — in essence, billions of words carefully strung together — that are needed to build large language models. In the race to outdo each other in developing the most advanced AI chatbots, a number of tech companies have turned to online repositories of stolen books that they can get for free.