news-16072024-101744

An investigation by Proof News has revealed that some major tech companies like Apple, NVIDIA, and Anthropic have been training their AI models using a dataset that includes transcripts from over 173,000 YouTube videos without permission. The dataset, created by EleutherAI, contains transcripts from a wide range of YouTube channels, including popular creators like Marques Brownlee and MrBeast, as well as news publishers like The New York Times and the BBC.

While the dataset does not include any actual videos or images from YouTube, it does contain valuable video transcripts. Marques Brownlee took to social media to express his concern, stating that a company sourced data for their AI from sources that scraped data from YouTube videos, including his own. This raises important questions about the ethics of using data without consent or compensation.

It’s worth noting that Apple, NVIDIA, Anthropic, and EleutherAI did not respond to requests for comments on this issue. This lack of transparency from AI companies regarding the data sources used to train their models is troubling. Recently, Apple faced criticism from artists and photographers for not disclosing the sources of training data for Apple Intelligence.

YouTube, as the largest repository of videos, is a valuable resource for training AI models due to its vast amount of content. However, using data from YouTube without proper authorization is a violation of the platform’s terms of service, as highlighted by YouTube CEO Neal Mohan and Alphabet CEO Sundar Pichai.

This case underscores the broader issue of data privacy and consent in the tech industry. Creators and users should have control over how their data is used, especially when it comes to training AI models. As AI continues to advance, it is essential for companies to be transparent about their data sources and to ensure that they are ethically sourced.

If you are curious to see if subtitles from your YouTube videos or favorite channels are part of the dataset in question, you can use Proof News’ lookup tool to investigate further. This tool can help shed light on whether your content has been used without permission for AI training purposes.