GitHub Copilot Reveals Private Pages, Some Removed by Microsoft

February 28, 2025

210

Microsoft’s Copilot AI assistant has inadvertently exposed the sensitive contents of over 20,000 private GitHub repositories linked to major companies such as Google, Intel, Huawei, PayPal, IBM, Tencent, and even Microsoft itself. These repositories, originally shared publicly but later marked as private due to security concerns, have remained accessible through Copilot, raising serious data privacy issues.

The alarming discovery was made by the AI security firm Lasso in the latter half of 2024. Upon realizing that Copilot was still showcasing private repositories, Lasso delved deeper into the matter to gauge the extent of the problem. What they uncovered shed light on a concerning trend in data security, particularly for organizations that rely on GitHub for collaborative coding and version control.

Unveiling Zombie Repositories: The Data Breach Dilemma
Lasso researchers Ophir Dror and Bar Lanyado expressed their astonishment at how effortlessly Copilot could expose sensitive information stored in repositories that had transitioned from public to private status. They described these repositories as “zombie repositories,” emphasizing the risk of data exposure even if the content was briefly public before being secured. The researchers embarked on an automated process to identify these repositories and validate their findings, leading to a startling revelation about the vulnerability of data on GitHub.
One of the most concerning instances highlighted by Lasso involved a private repository belonging to Microsoft itself. The researchers discovered that Copilot was accessing this repository due to a cache mechanism glitch in Bing, Microsoft’s search engine. Despite the repository being set to private on GitHub, the pages were still indexed by Bing when they were public, allowing Copilot to access the data. This oversight underscored the interconnected nature of technology platforms and the far-reaching implications of data exposure.

Microsoft’s Response and Lingering Concerns
Following Lasso’s report in November, Microsoft took swift action to address the issue and prevent further data breaches. Changes were implemented to ensure that private repository data was no longer accessible through Bing’s cache, marking a crucial step towards enhancing data security. However, the discovery of a private repository linked to a lawsuit filed by Microsoft raised new concerns about the lingering accessibility of sensitive information.
The repository in question reportedly contained tools designed to circumvent the safety measures integrated into Microsoft’s generative AI services. Despite being removed from GitHub post-lawsuit, Copilot continued to provide access to these tools, highlighting the challenges of fully eradicating private data from AI-driven platforms. This revelation underscored the need for continuous vigilance in safeguarding confidential information and prompted a reevaluation of data privacy protocols across tech ecosystems.
In conclusion, the inadvertent exposure of private GitHub repositories by Copilot serves as a cautionary tale for organizations entrusted with safeguarding sensitive data. As technology continues to evolve, the onus lies on developers, AI providers, and tech giants to prioritize data security and implement robust measures to prevent unauthorized access. The interconnected nature of digital platforms underscores the importance of comprehensive data protection strategies, ensuring that sensitive information remains shielded from potential breaches and vulnerabilities.