OpenAI Says It's Fine to Vacuum Up Everyone's Content and Charge for It Without Paying Them
Late last year, the New York Times became the first major US newspaper to sue OpenAI and Microsoft for copyright infringement, claiming the Sam Altman-led company had made unauthorized use of its published work to train its large language models.
The lawsuit showed that ChatGPT could easily be used to extensively regurgitate paywalled content almost word for word, an arguably glaring example of a company benefiting from the NYT's work without express permission by charging its users a monthly fee.
Now, just under two weeks later, OpenAI published a 1,000-word blog post in response to the lawsuit, arguing that it should have unfettered rights to train its models on the newspaper's work, and that such a practice is considered fair use under US copyright law — a particularly hot-button subject that has yet to be debated in court.
It's a heated challenge that could have considerable implications for the future of journalism. By allowing users to skirt around paywalls and subscriptions, OpenAI is directly undercutting an important revenue source for news outlets around the world. And that doesn't bode well, considering the sorry state of the industry in the year 2024.
Meanwhile, OpenAI is trying to drum up support among the NYT's peers. In its blog post, the company argued that it still had goodwill among other news organizations, writing that "we’ve met with dozens, as well as leading industry organizations like the News/Media Alliance, to explore opportunities, discuss their concerns, and provide solutions."
OpenAI has also partnered with the Associated Press, Business Insider's parent company Axel Springer, and New York University, among others.
As far as ChatGPT blatantly "regurgitating" paywalled content is concerned, the company didn't refute that its chatbot was making copies of the NYT's content without express permission, but called the instances laid out in the lawsuit a "rare bug." The company also said that the NYT was using "intentionally manipulated prompts" to get "our model to regurgitate."
OpenAI even went as far as to argue that the newspaper "didn't meaningfully contribute to the training of our existing models and also wouldn't be sufficiently impactful for future training."
Apart from this clear shot across the bow, OpenAI conceded it may have screwed up just a little bit.
"Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for the New York Times," the blog reads. "Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models."
Whether the company's efforts will actually meaningfully address the issue remains to be seen. OpenAI's entire business model relies on hoovering up as much data as it can find, often including copyrighted material.
Just earlier this week, OpenAI begged the British Parliament to allow it to use copyrighted works because, it said, it's "impossible" for the company to train its AI models without it.
The NYT's lawsuit sets the stage for what will likely be a drawn out legal battle between the AI industry and copyright holders who produced the content that these AI models are being fed.
Critics have pointed out that OpenAI is likely trying to avoid having to pay significant licensing fees. The company is already burning through huge amounts of investment to keep the lights on, since training AI models is an incredibly resource-intensive process.
While it's unclear why exactly previous negotiations between the NYT and OpenAI collapsed, followed by the paper's lawsuit weeks later, there's a decent chance the two players disagreed about ways to share revenues.
"I run a sandwich shop," X-formerly-Twitter user Craig Cowling wrote in a sarcastic tweet parodying OpenAI's line of argumentation. "There's no way I could make a living if I had to pay for all my ingredients. The cost of cheese alone would put me out of business."
"If the current batch of AI companies cannot create AI that performs reasonably well based on public domain sources and whatever they are prepared to pay to license," professor and AI critic Gary Marcus wrote in a blog, "they should go back to the drawing board — and figure out how to build software that doesn’t have a plagiarism problem — rather than fleecing artists, writers, and other content providers."
If OpenAI turns out to be successful and wins these lawsuits thereby setting a precedent, content creators may struggle even more, with potentially disastrous consequences.
"But in the end, nobody will create good, fresh, new content anymore, and the internet will start eating its own tail," Marcus wrote. "We will all suffer in the end, fed a mediocre stew of regurgitate. Let us hope that doesn’t happen."
More on OpenAI: AI Image Generators Are Spitting Out Copyrighted Characters, Raising Possibility of Catastrophic Lawsuit