Microsoft sued for open-source piracy through GitHub Copilot

05 Nov 2022


Programmer and lawyer Matthew Butterick has sued Microsoft, GitHub, and OpenAI, alleging that GitHub’s Copilot violates the terms of open-source licenses and infringes the rights of programmers.

GitHub Copilot, released in June 2022, is an AI-based programming aid that uses OpenAI Codex to generate real-time source code and function recommendations in Visual Studio.

The tool was trained with machine learning using billions of lines of code from public repositories and can transform natural language into code snippets across dozens of programming languages.

Clipping authors out

While Copilot can speed up the process of writing code and ease software development, its use of public open-source code has caused experts to worry that it violates licensing attributions and limitations.

Open-source licenses, like the GPL, Apache, and MIT licenses, require attribution of the author’s name and defining particular copyrights.

However, Copilot is removing this component, and even when the snippets are longer than 150 characters and taken directly from the training set, no attribution is given.

Some programmers have gone as far as to call this open-source laundering, and the legal implications of this approach were demonstrated after the launch of the AI tool.

“It appears Microsoft is profiting from others’ work by disregarding the conditions of the underlying open-source licenses and other legal requirements,” comments Joseph Saveri, the law firm representing Butterick in the litigation.

To make matters worse, people have reported cases of Copilot leaking secrets published on public repositories by mistake and thus included in the training set, like API keys.

Apart from the license violations, Butterick also alleges that the development feature violates the following:

  • GitHub’s terms of service and privacy policies,
  • DMCA 1202, which forbids the removal of copyright-management information,
  • the California Consumer Privacy Act,
  • and other laws giving rise to the related legal claims.

The complaint was submitted to the U.S. District Court of the Northern District of California, demanding the approval of statutory damages of $9,000,000,000.

“Each time Copilot provides an unlawful Output it violates Section 1202 three times (distributing the Licensed Materials without: (1) attribution, (2) copyright notice, and (3) License Terms),” reads the complaint.

“So, if each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000.”

Harming open-source

Butterick also touched on another subject in a blog post earlier in October, discussing the damage that Copilot could bring to open-source communities.

The programmer argued that the incentive for open-source contributions and collaboration is essentially removed by offering people code snippets and never telling them who created the code they are using.

“Microsoft is creating a new walled garden that will inhibit programmers from discovering traditional open-source communities,” writes Butterick.

“Over time, this process will starve these communities. User attention and engagement will be shifted […] away from the open-source projects themselves—away from their source repos, their issue trackers, their mailing lists, their discussion boards.”

Butterick fears that given enough time, Copilot will cause open source communities to decline, and by extension, the quality of the code in the training data will diminish.

BleepingComputer has contacted both Microsoft and GitHub for a comment on the above, and we received the following statement from GitHub.

"We’ve been committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe." - GitHub.