AI large-scale cost reduction? Another "open-source inference model" comparable to o1 has arrived, using Alibaba's model to generate data, with training costs surprisingly under $450!
The significant reduction in development costs is mainly attributed to the application of synthetic training data— the NovaSky team utilized Alibaba's QwQ-32B-Preview model to generate initial training data, then "organized" the data mix, and used OpenAI's GPT-4o-mini to restructure the data into a more usable format, ultimately forming a usable training set
Has the era of low-cost training for artificial intelligence reasoning models arrived?
Recently, the research team NovaSky from the Sky Computing Lab at the University of California, Berkeley, released an open-source artificial intelligence reasoning model named Sky-T1-32B-Preview. This model has demonstrated performance comparable to OpenAI's early o1 version in several key benchmark tests, and even more impressively, its development cost was only $450!
Compared to the model development costs that often reach millions of dollars not long ago, Sky-T1-32B-Preview represents a significant advancement. The NovaSky team stated in a blog post:
"The training cost of Sky-T1-32B-Preview is less than $450, proving that it is feasible to replicate advanced reasoning capabilities at low cost and high efficiency."
So, why was the NovaSky team able to significantly reduce training costs?
According to the NovaSky team's report, the substantial reduction in development costs is mainly attributed to the use of synthetic training data— the NovaSky team utilized Alibaba's QwQ-32B-Preview model to generate the initial training data for Sky-T1-32B-Preview, then "organized" the data mix and used OpenAI's GPT-4o-mini to reconstruct the data into a more usable format, ultimately forming a usable training set. Training the 32 billion parameter Sky-T1-32B-Preview model using 8 Nvidia H100 GPU racks takes about 19 hours.
The report also mentioned that Sky-T1-32B-Preview outperformed the early preview version of o1 on some challenging problems in MATH500 (a set of competition-level math challenges) and LiveCodeBench (a programming assessment set); in the GPQA-Diamond (which includes PhD-level physics, biology, and chemistry questions) test, Sky-T1-32B-Preview's performance was slightly inferior to the o1 preview version.
However, it is important to note that the officially released o1 version by OpenAI has stronger performance and is expected to launch a more advanced o3 model in the coming weeks