NVIDIA ACCUSED OF COPYRIGHT INFRINGEMENT
In September 2022, NVIDIA released NeMo Megatron–GPT (NeMo Megatron), a series of large language models (LLMs). LLMs are artificial intelligence software programs designed to emit convincingly naturalistic text outputs in response to user prompts.
NVIDIA trained NeMo Megatron by copying an enormous quantity of textual works, extracting protected expression from these works, and transforming that protected expression into a large set of numbers called weights that are stored within the models. Much of the training dataset’s material, however, comes from copyrighted works that NVIDIA copied without consent, credit, and compensation.
NeMo Megatron models are hosted on the Hugging Face website, where they have model cards that provide information about the models, including their training dataset. NeMo Megatron’s model cards state that the models were trained on a dataset called, “The Pile.” Included in that dataset is a collection of books called Books3. Books3 derives from a copy of the contents of the Bibliotik collection of ebooks and other electronic resources. Bibliotik is one of multiple notorious “shadow library” websites that host and distribute vast quantities of unlicensed copyrighted material in violation of the U.S. Copyright Act.
The Books3 dataset was available from Hugging Face until October 2023. At that time, it was removed, and a message was posted in its place stating that the dataset “is defunct and no longer accessible due to reported copyright infringement.” Thus, NVIDIA has admitted training its NeMo Megatron models in a way that directly infringes the copyrights of authors.
CASE FILED
On March 8, 2024, the firm filed a lawsuit on behalf of plaintiff and class-member authors who own registered copyrights in books that were included in the dataset that NVIDIA has admitted to copying to train NeMo Megatron. The case, Nazemian v. NVIDIA Corporation, in the United States District Court for the Northern District of California, seeks damages for the authors and an injunction to prevent NVIDIA from further infringing on their copyrights. The lawsuit also seeks destruction or other reasonable disposition of all copies of works that NVIDIA made or used in violation of the exclusive rights of plaintiffs and the class.
"Artificial intelligence is changing every aspect of the modern world and legal landscape. We must recognize and protect the rights of authors such as these against unlawful theft and fraud," said firm founder, Joseph Saveri. “NeMo Megatron–GPT infringes these authors’ rights and endangers their ability to pursue “author” as a viable career path. This case represents a larger fight for preserving ownership rights for all artists and other creators."