The Dangerous Race to Develop AI: When Legal and Ethical Considerations Go Out the Window

New York Times: OpenAI and Google Utilize YouTube Video Transcriptions to Train AI Models

The competition to create the most advanced AI models drives technology companies to seek out new sources of data, often without regard for the policies of digital services that prohibit such practices. For instance, OpenAI reportedly utilized a tool called Whisper to transcribe YouTube videos in order to train their GPT-4 language model, despite YouTube’s policies against such use.

Meta, another tech company working on large language models and AI, has also been accused of collecting data from the internet without consideration for copyright protections. Internal recordings suggest that Meta may face legal challenges for its data collection methods.

YouTube CEO Neal Mohan has spoken out against the misuse of video content for training AI models, stating that content creators trust YouTube’s terms of service to protect their work. However, sources familiar with Google’s practices suggest that even YouTube itself may have used video transcriptions to train its AI models.

Overall, the race to develop powerful AI models has led technology companies like OpenAI, Meta, and Google to seek data from various sources on the internet, often overlooking legal and ethical considerations. These practices raise concerns about copyright violations and compliance with platform policies.

Leave a Reply