Microsoft AI chief discusses AI trends: large and small models will "advance together," and the "law of scale" is far from its limits

Wallstreetcn
2024.11.03 05:47
portai
I'm PortAI, I can summarize articles.

At least in the next two to three years, there will be no slowdown in the progress of the "scale rule" in delivering better-than-expected performance

In a recent interview, Microsoft's AI chief Mustafa Suleyman delved into the latest trends in the field of artificial intelligence. He believes that in the coming years, AI models will show a trend of both large and small models "advancing together."

On one hand, the competition for scaling large models will continue, incorporating more modalities of data, such as video and images. On the other hand, the technology of training small models using large models (such as distillation) is on the rise, and efficient small models will play a significant role in specific scenarios. Suleyman added that in the future, knowledge will be condensed into smaller, cheaper models embedded in various devices, achieving a true revolution in environmental perception.

For entrepreneurs, Suleyman believes that understanding and utilizing prompt engineering is crucial. By providing high-quality instruction sets, entrepreneurs can guide pre-trained models to align with their brand values and create unique products. Additionally, small models hold great opportunities, allowing entrepreneurs to leverage their low cost and efficiency to develop applications for specific use cases.

During the interview, Suleyman also emphasized the importance of data integration. Synthetic data will become key to training models, but how to acquire and integrate this data still requires in-depth exploration.

Moreover, this Microsoft AI chief discussed the inclusion of new modalities, such as the integration of video and images, as well as understanding and collecting data on action trajectories across complex digital interfaces. He believes this will lead to many impressive results. For entrepreneurs, how to leverage these new trends and technologies for innovation will be key to future success.

Here is the full content, enjoy~ ✌️ (For readability, we have made brief edits to the original text)

Q: In the evolving landscape of models in the coming years, what should we pay attention to?

A: The scale of models is both increasing and decreasing, and this trend is almost certain to continue.

A new method called distillation has become popular since last year. This method uses large, high-cost models to train small models. The supervisory effect is quite good, and there is ample evidence supporting this.

Therefore, scale remains a key factor in this competition, and there is still significant room for development, with data volume continuing to grow.

At least in the next two to three years, there will be no slowdown in the progress of the "scale rule" in delivering performance beyond expectations.

Q: What new modalities can be added?

A: People are also considering incorporating new modalities such as video, images, and action trajectories across complex digital interfaces into models.

But what we are really interested in is the action trajectories across complex digital interfaces, such as jumping from a browser to a desktop, then moving to a mobile phone, switching between different ecosystems, whether in a closed garden or an open network We are trying to understand these trajectories, collect a large amount of data, and use methods such as supervised learning and fine-tuning. I believe this will lead to many impressive results.

Q: In terms of data, what aspects do people not think enough about?

A: There are many angles to discuss data, and a classic question is which data can be used and its quality. I think there has already been a lot of discussion online.

However, people have not spent enough time thinking about the sources of new data and how to integrate this data.

For example, synthetic data is an interesting area; if we have such data, we can train better small models and large models. How to obtain this data and ensure its integration is a key issue. But how to obtain this data and ensure they are integrated has not been discussed enough.

Q: What is the difference between prompts and questions when dealing with models?

A: A prompt is not just the question you ask the chatbot. When you ask a chatbot a question, that is a question; when you write a three-page style guide and attach examples to imitate, that is a prompt.

A prompt is your high-quality set of instructions that guides the pre-trained model to behave in a specific way. Surprisingly, models can behave very differently with just a few pages of instructions.

To make the model exhibit subtle, precise, and brand-aligned behaviors, you need to showcase thousands of examples of good behavior and fine-tune these examples into the model. This is a continuation of the pre-training process, using high-quality and accurate data.

The good news is that thousands of examples are very easy to obtain for many niche or specific verticals. This is an advantage, and startups have significant room to fine-tune pre-trained models with high quality.

Q: What opportunities do small models bring? How can entrepreneurs leverage them to do something interesting and unique?

A: Small models undoubtedly represent the future.

Large models activate billions of irrelevant neural representations when processing queries; although they efficiently search and reference millions of nodes, it is not always necessary.

We will condense knowledge into smaller, cheaper models that can reside on various devices, such as earbuds, wearables, earrings, plants, or sensors.

This environmental awareness revolution has long been anticipated, and it will bring functional devices, such as a refrigerator magnet, which is the smallest digital device I can think of. It can greet you in the morning, tell you the weather, inform you of what might or might not be in the fridge, and remind you to check your calendar.

It can welcome you in the morning, tell you the weather, inform you of what might or might not be in the fridge, and remind you to check your calendar.

Such models may have only tens of millions of parameters. Although no one has really pushed this yet, any two-person team can explore this area.

Q: What questions should people think about in the next couple of days?

A: The question is, what do technologists need to do to design a more human-centered future? This includes thinking about how technology evolves humanity, and how our emotions, passions, and compassion are expressed through our ever-changing relationship with technology.

Q: Why is this considered a transformative moment?

A: We have enough evidence to show that the major technological transformations of the past fifty years have reshaped the structure of things.

I believe this is a moment to start companies, expand companies, or even change careers. Even if you are not an entrepreneur, whether you are an activist, organizer, or scholar, now is the moment to pay attention.

By 2050, the train will have left the station, and the situation will be very different. We have the opportunity now to shape and influence the future together; nothing is predetermined. We are very fortunate to be alive at this moment, which is both a tremendous responsibility and an exciting opportunity