What is Open Source AI? The official definition is finally here Meta Llama model did not pass the test

The Open Source Initiative (OSI), which has long been dedicated to defining and "managing" all open source affairs, released its Open Source AI Definition (OSAID) 1.0 on Monday. According to OSAID, for an AI model to be considered open source, it must provide enough information for anyone to "substantially" rebuild the model. The model must also disclose any important details related to its training data, including the source of the data, how it was processed, and how it was obtained or licensed. However, Meta does not endorse this definition and believes that there is no single definition of open source AI

Open Source AI finally has an "official" definition.

OSAID also outlines the rights that developers should enjoy when using open source AI, such as the ability to use and modify the model for any purpose without needing permission from others.

Stefano Maffulli, the Executive Vice President of OSI, stated that the main purpose of establishing an official definition for open source AI is to align policymakers and AI developers.

"Regulators are already paying attention to this area. We have explicitly promoted it to stakeholders and the community, and have even tried to reach out to organizations that regularly engage with regulators to get early feedback."

"Open source AI is an AI model that allows you to fully understand how it is built, meaning you can access all components, such as the complete code used for training and data filtering. Most importantly, you should be able to build on top of it."

OSI does not force developers to comply with the OSAID definition, but intends to label models that do not meet this definition but are described as "open source". "We hope that when someone tries to abuse this term, the AI community will say, 'We do not recognize this as open source' and correct it," Maffulli said.

Meta: My Objection

Currently, many startups and large tech companies, especially Meta, describe their AI model release strategies as "open source," but few actually meet the standards of OSAID. Researchers have found that many "open source" models are only nominally open source, with the actual data required to train the models being kept confidential, and the computational power required to run these models exceeding the capabilities of many developers.

For example, Meta requires platforms with over 700 million monthly active users to obtain special permission to use its Llama model. Maffulli openly criticized Meta for calling its model "open source". Google and Microsoft, after discussions with OSI, have agreed not to label partially open models as "open source," but Meta has not done so.

In addition, Stability AI, which has long promoted its models as "open source," requires companies with annual revenue exceeding $1 million to obtain an enterprise license, while the licensing terms of the French AI startup Mistral prohibit the use of certain models and outputs for commercial purposes.

Naturally, Meta disagrees with this assessment. Although the company participated in the drafting process of the definition, it has objections to the wording of OSAID. A Meta spokesperson stated that the licensing terms of Llama and the accompanying acceptable use policy provide protection against harmful applications. Meta also stated that as AI-related regulations in California are evolving, the company's approach to sharing model details is "cautious."

"We align with OSI's partners in many ways, but we and other companies in the industry do not agree with their new definition. We believe there is no single open-source AI definition because past open-source definitions cannot cover the complexity of today's rapidly evolving AI models. We make Llama freely available for public use and ensure security through licensing and usage policies. Regardless of the technical definition, we will continue to collaborate with OSI and other industry groups to increase the ease of free AI usage."

Analysis suggests that Meta's reluctance to disclose training data may be related to its own development process and that of most AI models.

AI companies collect large amounts of images, audio, videos, and other data from social media and websites to train models with these "publicly available data." In today's competitive market, the methods for collecting and optimizing datasets are seen as a competitive advantage, and companies often refuse to disclose them for this reason.

However, the details of training data may also expose developers to legal risks. Authors and publishers claim that Meta used copyrighted books for training. Artists have also sued Stability AI, accusing it of using their works without recognition, likening their actions to theft.

Therefore, OSAID's open-source AI definition may pose problems for companies trying to smoothly resolve lawsuits, especially if plaintiffs and judges consider the definition reasonable and cite it in court