ABOUT THE CASE
This case represents the first major step in the battle against intellectual-property violations in the tech industry arising from artificial intelligence systems.
GitHub Copilot, an AI-based coding product made by GitHub in cooperation with OpenAI, appears to profit from the work of open-source programmers by violating the conditions of their open-source licenses. According to GitHub, Copilot has been trained on billions of lines of publicly-available code, leaving open-source programmers with serious concerns regarding license violations. Microsoft apparently is profiting from others' work by disregarding the conditions of the underlying open-source licenses and other legal requirements.
Copilot was announced by Microsoft in 2022. According to Microsoft, Copilot is an extension that works as an "AI pair programmer" that helps write code faster by auto-filling suggestions based on the code and comments written by the user on their editors (Visual Studio Code, Visual Studio, Neovim, etc.). It is powered by OpenAI’s Codex, which is trained on billions of lines of publicly available code.
However, according to reports, in practice, Copilot can act more as an auto-coder that suggests large blocks of code without alerting the Copilot user that the code is only useable subject to the terms of its open-source license. Microsoft has long been antagonistic to open-source software, waging a war against open-source pioneer Linux for decades as one notable example. This is why developers feared how Microsoft might leverage GitHub's central role in the open-source community when it first acquired GitHub for $7.5 billion in 2018. With Copilot those fears may be coming to fruition.
Microsoft has monetized Copilot by offering it as a subscription service. Although Copilot is free for verified students and maintainers of popular open-source projects, “Copilot requires running software that is not free, such as Microsoft’s Visual Studio IDE or Visual Studio Code editor.”
The Copilot FAQ states “You are responsible for the code you write with GitHub Copilot’s help” and admits “GitHub does not own the suggestions GitHub Copilot generates.” However, it also notes "about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set." Independent analysis has found “[i]n files where Copilot is enabled, it accounts for nearly 40% of code in popular programming languages like Python.”
This lawsuit constitutes a critical chapter in an industry-wide debate regarding the ethics of training AI tools with data sourced without permission from their creators and what constitutes a fair use of intellectual property. Despite Microsoft’s protestations to the contrary, it does not have the right to treat source code offered under an open-source license as if it were in the public domain.