news-21112024-063009

OpenAI, a well-known artificial intelligence research lab, is currently facing a lawsuit from The New York Times and Daily News. The publishers are accusing OpenAI of using their works without permission to train its AI models. The lawyers representing The Times and Daily News claim that OpenAI engineers mistakenly deleted data that could have been crucial evidence in the case.

To assist in the lawsuit, OpenAI agreed to provide virtual machines for the publishers’ legal teams to search for their copyrighted content within the AI training sets. Unfortunately, on November 14, OpenAI engineers accidentally erased all the search data stored on one of the virtual machines. Although some of the data was recovered, the folder structure and file names were lost, rendering the recovered data unusable for determining if the publishers’ articles were used in building OpenAI’s models.

As a result of this mishap, The Times and Daily News had to start their work from scratch, leading to a significant amount of person-hours and computer processing time being wasted. The incident highlights the importance of OpenAI taking responsibility for searching its own datasets for potentially infringing content using its own tools.

While OpenAI contends that using publicly available data, such as articles from The Times and Daily News, falls under fair use when training AI models, the lab has entered into licensing agreements with several new publishers. These agreements remain confidential, but reports suggest that Dotdash, one of OpenAI’s content partners, is being paid a substantial amount annually.

Although OpenAI has not admitted to training its AI systems on specific copyrighted works without authorization, the lawsuit underscores the challenges and ethical considerations surrounding the use of copyrighted content in AI development. The outcome of this case could have significant implications for how AI research labs handle and utilize data from publishers in the future.