OpenAI's o1 model research team discusses with Sequoia Capital US partner: The dimensions of the o1 series that have not been fully developed so far have a much higher ceiling than many people imagine

Wallstreetcn
2024.10.04 06:28
portai
I'm PortAI, I can summarize articles.

The OpenAI o1 research team had a discussion with partners from Sequoia Capital USA, emphasizing the breakthrough in reasoning time extension of the o1 model, enhancing its ability to solve complex problems. Team members pointed out that o1 not only opened up new dimensions in AI capabilities but also provided new directions for the development of future models (such as o2 and o3). Despite o1's outstanding performance in reasoning tasks, there is still room for improvement in certain non-STEM fields

Fortunately, as part of the OpenAI o1 model research team, three core technical members Noam Brown (OpenAI Research Scientist, focusing on AI reasoning and reinforcement learning), Hunter Lightman (OpenAI Senior Engineer, researching AI reasoning capabilities in complex problems), and Ilge Akkaya (OpenAI Researcher, particularly focusing on AI applications in mathematics and logical reasoning) recently had a discussion with Sequoia US partners Sonya Huang and Pat Grady.

Noam Brown emphasized that a key breakthrough of the o1 model lies in the ability enhancement brought by extending reasoning time. By increasing the reasoning time, the model exhibits spontaneous backtracking and self-correction abilities when solving complex problems, making it particularly outstanding in tasks like Sudoku and complex logic. Through this "delayed reasoning," o1 can more effectively handle high-difficulty tasks.

Hunter Lightman believes that the o1 model not only expands AI's capabilities through reasoning time but also opens up new directions for the future development of AI. Compared to AI models that previously relied on data and computational power expansion, the reasoning time extension of o1 represents a new dimension. This approach is expected to further expand AI capabilities in future model versions (such as o2 and o3).

Noam Brown and Hunter Lightman acknowledged that although o1 performs well in reasoning tasks, it is not superior to other AI models in all tasks, especially in non-STEM fields, where there is still significant room for improvement.

Here are the main contents of this conversation, enjoy~

Sonya Huang:

o1 is OpenAI's first major attempt in reasoning time calculation. We look forward to discussing topics such as reasoning, chain of thought, and the law of reasoning time scaling with the team.

Ilge, Hunter, Noam, thank you for coming and congratulations on bringing o1 to the public. I would like to ask, were you confident from the beginning that this project would be successful?

Noam Brown:

I think we believed in the potential of this direction from the beginning, but the actual path to where we are today was not clear. Look at o1, this is not an overnight achievement. In fact, there have been many years of research behind it, and many of which did not yield results I believe that the leadership of OpenAI has always been convinced that this direction must yield results and is willing to continue investing even in the face of early setbacks, which has ultimately paid off.

Hunter Lightman:

I didn't initially have the strong confidence like Noam did. I spent a long time researching language models, trying to get them to learn to do mathematical and other reasoning tasks. The research process had its ups and downs, sometimes effective, sometimes not.

But when we started to see progress in this direction, I had an "aha" moment when I read some of the model-generated outputs, which approached problem-solving in different ways. That was the moment my confidence was established.

I think OpenAI as a whole takes a very empirical, data-driven approach, and when the data starts showing trends and becoming meaningful, we follow those clues. That was also the moment my confidence was established.

Sonya Huang:

Ilge, you have been working at OpenAI for a long time, five and a half years. How do you see it? Did you believe in this approach from the beginning?

Ilge Akkaya:

No, I made several wrong judgments after joining. I initially thought that robotics technology was the path to AGI, so I initially joined the robotics team, believing that AGI would emerge in embodied intelligence. However, things did not develop as expected.

During my time working here, the emergence of Chat GPT was undoubtedly a paradigm shift. We were able to showcase a universal interface to the world, and I am glad that we now have a possible new path to advance this reasoning paradigm. But for me, this path was not clear for a long time.

Pat Grady:

I know you can't disclose too many details for good reasons, but could you give a brief overview of how it works?

Ilge Akkaya:

The o1 model series uses reinforcement learning, capable of reasoning, or you can also call it "thinking." It is fundamentally different from the large language models we have used in the past.

We have seen its strong generalization ability in many different reasoning domains, and we have recently demonstrated this. So we are very excited about the paradigm shift that this new model family brings.

Pat Grady:

For those who are not very familiar with current language model technology, what is reasoning? Could you briefly define reasoning and explain why it is important?

Noam Brown:

A simple understanding is that reasoning is the ability to think longer about problems that benefit from extended thought. You know, humans have classic System 1 and System 2 thinking.

System 1 is automated, intuitive responses, while System 2 is slower, more process-driven responses. For some tasks, longer thinking time does not bring more benefits

For example, if I ask you, "What is the capital of Bhutan?" You could spend two years thinking about it, but that won't increase your accuracy. By the way, what is the capital of Bhutan? Actually, I don't know either. However, there are indeed some questions where taking more time to think can lead to higher accuracy.

A classic example is Sudoku, where you can theoretically try various solutions, and the correct solution is very easy to identify. Therefore, as long as you have enough time, you will eventually find the correct answer.

Many researchers in the field of AI have different definitions of reasoning, and I do not advocate that this is the only definition. Everyone has their own opinions, but I believe that reasoning is about those problems where considering more options and thinking for a longer time can be beneficial.

You can understand it as a gap between generation and verification: generating a correct solution is difficult, but identifying the correct solution is relatively simple.

I believe that all problems lie on this spectrum, such as Sudoku, where verification is easier than generation, and problems where verification and generation are equally difficult, such as knowing the capital of Bhutan.

Sonya Huang:

I would like to ask about the background of AlphaGo and Noam. To what extent does your previous research in poker and other games relate to the work on o1? What are the similarities and differences between them?

Noam Brown:

I think a major highlight of o1 is that it actually performs better as the thinking time increases. Looking back at many AI breakthroughs, AlphaGo is a classic example.

One of its notable features is that it spends a long time thinking before each move, possibly taking 30 seconds to decide the next step. If you let it make decisions instantly, it actually doesn't surpass top human players. Therefore, its performance largely depends on this extra thinking time.

The issue is that this extra thinking time relies on Monte Carlo Tree Search (MCTS), which is a specific reasoning method suitable for Go but not applicable in the early poker games I researched. Thus, although the neural network part (System 1 part) is general, the reasoning methods at that time were still specific to certain domains.

Another major highlight of o1 is that its reasoning method is very general and applicable to many different fields. We have seen users using it in various ways, which also validates this point.

Hunter Lightman:

What has always attracted me to language models is that their interfaces are very general and can adapt to various problems. This time, what excites us is that we believe we have a way to apply reinforcement learning on this general interface and look forward to seeing the possibilities in the future.

Pat Grady:

You mentioned the gap between generation and verification, which varies on different problems. So, in the process of reasoning, is the approach to handling this gap consistent, or are there different methods in different situations?

Hunter Lightman:


One of the exciting aspects of this release is that o1 is now in the hands of so many people, allowing us to see where it performs well and where it falls short. This is one of OpenAI's core strategies, where we observe how the world interacts with it through iterative technical deployments and continuously improve our research.

Pat Grady:

On Twitter, have there been any ways users have used o1 that surprised you?

Ilge Akkaya:

One thing that really excites me is seeing many doctors and researchers treating this model as a brainstorming partner. They have been working in cancer research for many years and are discussing ideas about gene discovery and gene therapy with the model.

While the model cannot conduct research on its own, it can be a great collaborative partner for humans, helping to advance scientific research.

Sonya Huang:

Noam, I remember you tweeted that Deep Reinforcement Learning (Deep RL) has emerged from the "disillusionment trough." Could you explain what you mean in detail?

Noam Brown:

I think it all goes back to Atari games, where DeepMind's Deep Reinforcement Learning (DRL) results on Atari were once very popular. I was pursuing my Ph.D. around 2015 to 2019, and DRL was undoubtedly the hottest research area at that time.

In some ways, a lot of research progress was indeed made, but some issues were also overlooked. One of the overlooked aspects was the power of training with massive amounts of data, like the training method of GPT. In a way, this was very surprising.

Look at AlphaGo, it is undoubtedly one of the major achievements in the field of Deep Reinforcement Learning. Although it involves RL (Reinforcement Learning) steps, what's more important is that AlphaGo also learned based on human data before that, which is the real reason why AlphaGo took off.

Then, there gradually emerged a view in the research community that learning without relying on human data from scratch is the "pure" direction.

This also led to the emergence of AlphaZero, which performed better than AlphaGo, but this shift in the process overlooked the potential of large-scale data training like GPT, and apart from OpenAI, few paid attention to this direction.

OpenAI saw some initial results in this direction early on and was determined to double down. So, DRL did experience a peak period, and then with the success of large models like GPT-3, the heat of DRL decreased, and many lost confidence in it. However, with the emergence of o1, we see that DRL still has strong potential when combined with other elements.

Sonya Huang:

I believe many achievements in DRL have been produced under relatively clear settings. Is o1 in games the first case where DRL is used in a broader, unbounded environment? Is this understanding correct?

Noam Brown:

Yes, I think that's a very good point. Many of the highlights of DRL are indeed very cool, but their applicability is also very narrow. While we have seen some quite useful and general DRL results, nothing can compare to the impact of GPT-4. Therefore, I believe that in the new paradigm, DRL will reach a similar level of influence in the future.

Sonya Huang:

I still remember the results of the AlphaGo matches, especially the move on the 37th turn in some tournaments, which shocked everyone.

Have you seen similar moments in the research of o1, where the model gave an unexpected answer that turned out to be correct, even better than human thoughts? Have you had such moments, or do you think it might have to wait until o2 or o3?

Hunter Lightman:

I remember an example when we were preparing for the IOI (International Olympiad in Informatics) and put the model into the process of problem-solving. There was a problem where o1 insisted on solving it in a strange way, the specific details I'm not very clear about, my colleagues, who are better at competitive programming, tried to figure out why the model did that.

I think it wasn't a "genius work" moment, but rather the model didn't know the correct solution, so it kept trying until it found another solution. It did solve the problem, just in a seemingly strange way. I remember this was an interesting example where the model indeed thought about the problem in a different way than humans in the results of programming competitions.

Ilge Akkaya:

I saw the model solve some geometry problems, and its way of thinking surprised me. For example, if you ask the model to calculate a point on a sphere and then inquire about the probability of an event happening, the model would say: "Let's first imagine this scenario, place these points, and then think from this perspective."

This way of visualizing through language really surprised me, just like what I would do as a human, and seeing o1 being able to do this too really caught me off guard.

Sonya Huang:

Very interesting. This is not only understandable by humans, but also expands our way of thinking about problems, not just some kind of incomprehensible machine language. This is really fascinating.

Hunter Lightman:

Yes, I do think the coolest thing about the results of o1 is that its chain of thought can be interpreted by humans, which allows us to understand the model's thinking process.

Pat Grady:

Have there been any "aha" moments during the research? Hunter, you mentioned that at first you weren't sure if this direction would be successful, was there a moment when you suddenly realized, "Oh my, this direction really works!" Hunter Lightman:

I have been working at OpenAI for about two and a half years, most of the time working hard to make models better at solving mathematical problems. We have done a lot of work on this, building various custom systems.

During the research process on o1, we trained a new model with some fixes and modifications applied, and it scored higher in mathematical evaluations than all our previous attempts, even surpassing the custom systems we designed.

We looked at the changes in the thought chains and found that they exhibited different characteristics. Especially when the model made a mistake, it would say, "Wait, this is not right, I need to step back and find the correct direction again." We call this behavior "backtracking."

I have been waiting for a long time to see examples of models being able to backtrack, and when I saw this score and thought chain, I realized that there is real potential, and I need to update my perspective. This is the moment my confidence was established.

Noam Brown:

I feel like it's a similar story for me as well. Around the same time, when I joined, my idea was that models like Chat GPT don't really "think" before responding, their reactions are very quick.

In the AI field of games, spending more time thinking can lead to better results. So I have been thinking about how to incorporate this into language models.

It sounds simple, but actually implementing it is a challenge. We discussed a lot about how to make models have reflective abilities, how to backtrack when making mistakes, or try different approaches.

In the end, we decided to try a basic approach, which is to let AI think for a longer time. And we found that once AI has more thinking time, it almost spontaneously develops these abilities, including backtracking and self-correction.

These are all things we wanted the model to achieve, and now it has been achieved through such a simple and scalable way.

Noam Brown:

This was a key moment for me, when I realized that we can further push in this direction, and the direction is very clear.

Hunter Lightman:

I have been understanding how strong Noam's conviction in "conviction compute" is. I remember when he just joined, many of our one-on-one conversations revolved around the power of computation during testing.

At various stages of the project, Noam would say, "Why not let the model think for a longer time?" And then we did it, and the model performed better. The look on his face when he watched us was a bit funny, as if to say, "Why didn't we do this before?"

Sonya Huang:

We noticed in your emails that o1 performed exceptionally well in the STEM field, significantly better than your previous models. Is there a rough explanation for this? Why is this the case? Noam Brown:

As I mentioned before, some tasks, such as reasoning tasks, are easier to verify an answer than to generate one. Problems in the STEM field often fall into the category of difficult-to-reason problems. So this is one of the important reasons we see o1 performing better in STEM disciplines.

Sonya Huang:

Understood. I would like to add a question. We saw in your published research paper that o1 passed your research engineer interview and with a fairly high pass rate. How do you see this? Does this mean that in the future, OpenAI will hire o1 to replace human engineers?

Hunter Lightman:

I don't think we have reached that level yet. I think there is still more work to be done.

Sonya Huang:

But it's very difficult to reach 100%, right?

Hunter Lightman:

Perhaps we need better interview methods. But at least in my opinion, o1 already resembles more of a programming partner than previous models. I believe it has already submitted code changes to our codebase several times.

In a sense, it does resemble a software engineer because software engineering also benefits from long-term reasoning in the STEM field.

I think the current model only thinks for a few minutes when reasoning, but if we continue this trend and let o1 think for longer periods, it may be able to complete more similar tasks.

Noam Brown:

You can know the day we achieve AGI is when we take down all recruitment information, and the company is either doing very well or very poorly.

Sonya Huang:

What do you think needs to be done to make o1 excel in the humanities? Do you think the advantages of reasoning, logic, and STEM fields will naturally extend to the humanities as reasoning time expands? Or are there other factors at play?

Noam Brown:

As you said, we released the model and are curious about what it excels at, what it doesn't excel at, and how users will use it. I believe there is still a gap between the model's raw intelligence and its usefulness in practical tasks.

It is very useful in some aspects, but it can be more useful in many more aspects. I think we have a lot of room for iteration to unlock this broader generality.

Pat Grady:

I am curious, is there a philosophy within OpenAI about the gap between model capabilities and practical application needs? Do you have a clear thought process to determine which tasks should be done by the model and which tasks should be left to the ecosystem around the API to solve?

Noam Brown:

Before I joined, I heard that OpenAI was very focused on AGI, and I was somewhat skeptical at the time. Basically, on my first day of work, the company held an all-hands meeting, and Sam stood in front of everyone, clearly stating that AGI is our top priority Therefore, the most clear answer is that AGI is our ultimate goal, and no single application is our top priority, except for the ability to use AGI.

Pat Grady:

Do you have a clear definition of AGI?

Noam Brown:

Everyone has their own definition, right? That's why this question is so interesting.

Hunter Lightman:

I'm not sure if I have a clear definition. I just think it may be related to the proportion of economically valuable work that AI systems can accomplish.

I believe this proportion will increase rapidly in the coming years. I'm not sure how exactly it will develop, but it might be one of those situations where "you'll know it when you feel it."

We may continuously adjust the standards until one day we work alongside these AI colleagues, and they are doing many of the tasks we do now, while we are doing different work. The entire work ecosystem will change.

Pat Grady:

One of your colleagues expressed well the importance of reasoning in the process towards AGI. His main point was: any task may encounter obstacles, and what helps you overcome these obstacles is your reasoning ability.

I think this is a good connection, explaining why reasoning is important and its relationship with the AGI goal. Do you think this is the best way to understand why reasoning is important, or are there other frameworks that can help us understand reasoning?

Hunter Lightman:

I think this is a question that needs further confirmation. Because in the process of developing these AI systems and models, we have seen their various performances and shortcomings.

We have learned a lot of new things in developing and evaluating these systems, and we are trying to understand their capabilities. For example, some things that come to mind are strategic planning, brainstorming, etc.

Pat Grady:

If we want AI to be as good as excellent product managers, it needs a lot of creativity and insights into user needs. Is this considered reasoning? Or is it a different kind of creativity from reasoning that needs to be handled differently?

When you start translating these plans into action, you also need strategic planning, considering how to drive the organization towards its goals. Is this considered reasoning?

Hunter Lightman:

Perhaps it's partly reasoning, but maybe it's something else as well. In the end, we may feel that these are all reasoning, or we may invent a new term to describe the new steps that need to be taken.

Ilge Akkaya:

I'm not sure how far we can push this reasoning issue. Whenever I think about this broad reasoning question, examples from the field of mathematics are always helpful.

We have spent a lot of time reading about the thinking process of models when solving mathematical problems. You can see that when it encounters obstacles, it will backtrack and try another approach This line of thinking makes me feel that perhaps it can be extended to areas beyond mathematics, which gives me some hope. Although I don't know what the final answer is, but I hope so.

Hunter Lightman:

What confuses me is that o1 is already better at math than me, but not as good in software engineering. So there is some kind of mismatch here.

Pat Grady:

Looks like there's still a lot of work to be done.

Hunter Lightman:

Yes, there are still some things to do. If my whole job was just solving Amy's problems and participating in high school math competitions, I might have been unemployed long ago. But now I still have work to do.

Pat Grady:

Since you mentioned "thought chains," observing the reasoning process behind it. I have a question, maybe you can't answer it, but let's just consider it an interesting discussion.

In your release of o1's blog, you explained why you chose to hide the "thought chain" and mentioned that part of the reason was for competitive reasons. I'm curious, is this a controversial decision? Because I can imagine that this decision makes sense, but I can also imagine you might choose to make it public. Can you talk about whether this is a controversial decision?

Noam Brown:

I don't think it's controversial. Just like not sharing the weights of cutting-edge models, sharing the thought process of models also carries many risks. I think it's a similar decision.

Sonya Huang:

Can you explain to laymen what a "thought chain" is? Can you give an example?

Ilge Akkaya:

For example, if someone asks you to solve an integral problem, most people would need a piece of paper and a pen, and then deduce step by step from complex equations to the final answer.

This process may yield an answer, such as 1, but how did you arrive at this answer? That's the "thought chain" in the field of mathematics.

Sonya Huang:

Let's talk about the future path, the law of expanding reasoning time. In the research you published, this is the most important chart in my opinion. It seems to be a result of profound significance, similar to the law of expansion in pre-training. Do you agree with this view? What impact will this have on the field?

Noam Brown:

I do think it is of profound significance. When we were preparing to release o1, I kept thinking about whether people would recognize its importance. Although we mentioned this, it's a rather subtle point.

I was really surprised and grateful to see so many people understand the significance of this point. There have been many concerns about AI possibly encountering bottlenecks or stagnation, especially as pre-training becomes more expensive, and there are questions about whether there is enough data.

o1, especially o1 Preview, conveys not so much its current capabilities but its significance for the future. We are able to discover a dimension in expansion that has not been fully developed so far, which I think is a major breakthrough, meaning the ceiling is much higher than many people imagine Sonya Huang:

What would happen if the model were allowed to think for hours, months, or even years?

Hunter Lightman:

We haven't let o1 run for that long yet, so we don't know.

Pat Grady:

Is there a background task running now? Maybe it's contemplating how to solve world peace.

Hunter Lightman:

There's a similar story called "The Last Question," which tells of a massive AI computer being asked how to reverse entropy, to which it responds: "I need more time to think."

The story goes on to say that, 10 years later, it's still thinking, 100 years later, 1000 years later, even ten thousand years later, it's still thinking.

Ilge Akkaya:

"There is not enough information at present to provide a meaningful answer." Something along those lines.

Sonya Huang:

Do you have any speculations about the future? Do you think as the model's reasoning time increases, will its intellectual limit reach a certain level? The reports I've seen so far indicate that its IQ is around 120, will it keep increasing indefinitely?

Hunter Lightman:

An important point is that an IQ of 120 is just a score in a particular test, it doesn't mean it has a reasoning ability of 120 in all areas.

In fact, we also discussed that it performs worse than 40 in some aspects, such as creative writing. So, speculating on this model's abilities is quite complex.

Noam Brown:

This is an important point. When we talk about these benchmark tests, we emphasize GPQA, which is a set of problems that a doctoral student would encounter, usually answered by doctoral students, but AI now surpasses many doctoral students in this benchmark test.

This doesn't mean it's smarter than doctoral students in all aspects. Doctoral students and humans can do many things that AI cannot. So when we look at these test results, we should understand that it only measures certain specific abilities, usually proxies for human intelligence, but the meaning is different for AI.

Hunter Lightman:

Perhaps it can be said that what I hope to see is that when we let the model think longer in the areas it's already good at, it will become better.

One of my "Twitter moments" was seeing my former math professor tweet about being impressed by o1 because he gave it a proof that had never been solved by AI before, and it actually completed it.

This makes me feel like we are at an interesting turning point, where the model could become a useful mathematical research tool. If it can help complete some small lemmas and proofs, that would be a real breakthrough. I hope that by letting it think longer, we can make greater progress in this regard.

It's difficult for me to predict how it will perform in areas it currently isn't good at. How do we make it better in these areas? How will the future unfold? Pat Grady:

Regarding the bottleneck issue of expansion. For pre-training, it is obvious that you need a large amount of computing power, a large amount of data, all of which require a large amount of funding. So it is easy to understand the bottleneck of expansion in pre-training. So, what limitations will there be in expanding reasoning time?

Noam Brown:

When GPT-2 and GPT-3 were released, it was obvious that as long as more data and GPUs were invested, their performance would significantly improve.

But even so, it took several years to go from GPT-2 to GPT-3 and then to GPT-4. It is not just a simple idea, there is a lot of work to be done to scale it up significantly.

I think we face similar challenges here, although the idea is simple, it takes a lot of work to truly expand it. So I think that is the challenge.

Hunter Lightman:

Yes, I think for researchers with a strong academic background, one surprising thing they may find after joining OpenAI is that many problems are not ultimately research problems, but engineering problems.

Building large-scale systems, training large-scale systems, and running algorithms that have already been invented, or unprecedented systems, are all very difficult. It takes a lot of difficult engineering work to scale these things up.

Ilge Akkaya:

In addition, we also need to know on what standards to test the model. We do have standard evaluation benchmarks, but there may be some areas we have not yet tested. So we are also looking for these areas, where we can invest more computing resources to get better test results.

Sonya Huang:

One thing I have always found difficult to understand is what happens when you provide the model with close to infinite computing resources. As a human, even a brain genius like Terrence Tao is subject to physiological limitations.

And you can increase the computing resources for reasoning time infinitely. Does this mean that all mathematical theorems can eventually be solved using this method? Or do you think there will be some kind of limit?

Hunter Lightman:

Infinite computing resources mean a huge computing power.

Sonya Huang:

Approaching infinity.

Hunter Lightman:

This reminds me of Asimov's story, if you let it think for ten thousand years, maybe it can solve some problems. But to be honest, we don't yet know what this kind of expansion means for solving truly difficult mathematical theorems. It may really take a thousand years of thinking to solve some unsolved core mathematical problems Noam Brown:

Yes, what I mean is, if you let it think for long enough, theoretically you can formalize everything, like Lean, traverse all possible proofs, and eventually you will find the theorem.

Hunter Lightman:

We already have algorithms that can solve any mathematical problem, maybe that's what you're trying to say.

Noam Brown:

Yes, as long as you have infinite time, you can do a lot of things. Of course, with the passage of time, the return will gradually diminish, but you can indeed make some progress.

Sonya Huang:

Very fair. What do you think is the biggest misunderstanding about o1?

Noam Brown:

I think a big misunderstanding is that when the project name "Strawberry" leaked, people thought it was because of an issue circulating on the internet: "How many hours does a strawberry have?" But that's not the case.

When we saw this question, we were also worried about whether there was an internal information leak. But as far as we know, it was just a coincidence that our project happened to be called "Strawberry," and that question happened to become popular.

Hunter Lightman:

As far as I know, the reason it's called "Strawberry" is simply because someone needed to come up with a code name at the time, and there happened to be someone in the room eating a box of strawberries, so it stuck.

Pat Grady:

In contrast, this name is more evocative than "Houston."

Noam Brown:

I'm impressed by how well it has been understood. We were really unsure how people would react when we released it. There was a lot of internal debate: would people be disappointed because it doesn't excel in all aspects? Or would they be impressed by its remarkable mathematical performance?

What we really want to convey is not the current capabilities of this model, but its future development direction. I'm not sure if everyone can understand this, but it seems that many people do, so I am very satisfied with that.

Sonya Huang:

Regarding o1, do you think any criticisms are valid?

Hunter Lightman:

Without a doubt, it cannot excel in all aspects. It's a bit of a quirky model, and many people have found different ways to better use it on the internet.

There are still many strange edge cases, and I look forward to seeing how the ecosystem will develop more intelligent products and applications based on our platform.

Hunter Lightman:

I think we are still in a very early stage. It's a bit like a year ago when people really started to figure out how to use GPT-4 and its language model programs, making software engineering tools smarter. I hope we will see similar progress, where people will innovate based on o1 Pat Grady:

Speaking of which, there's one thing we haven't discussed yet, which is the o1 Mini. I've heard a lot of people are very excited about the o1 Mini because there is a general interest in small models.

If you can retain reasoning ability and extract some world knowledge, then this is a very good thing. I'm curious, how excited are you about the o1 Mini and the direction it represents?

Ilge Akkaya:

This model is very exciting. For us researchers, if the model runs fast, its applications are more extensive. So we also like it very much. They have different uses.

We are happy to have a cheaper, faster version, as well as a heavier, slower version. They are very useful in different scenarios. So, we are very excited about achieving this balance.

Hunter Lightman:

I like this expression, which emphasizes the importance of progress. The o1 Mini allows us to iterate faster, and hopefully for the vast user ecosystem, it can also help them iterate faster. So it is a very useful and exciting product at least in this regard.

Sonya Huang:

For founders in the AI field, how should they consider when to use GPT-4 and when to use o1? Do they need to be engaged in STEM, programming, or math-related work to use o1? How should they think about this issue?

Hunter Lightman:

I hope they can help us find the answer.

Noam Brown:

One of the motivations for releasing the o1 Preview is to see what people will ultimately do with it, how they will use it. In fact, we have also discussed whether it is worth releasing the o1 Preview.

But one of the reasons for the final release is to let everyone get early access to it, see where it is most useful, where it is less suitable, and how to improve it to meet user needs.

Sonya Huang:

What do you think people are currently most likely to underestimate about o1?

Hunter Lightman:

I think this proves that our ability to name models has improved, at least we didn't call it "GPT-4.5 Thinking Mode".

Sonya Huang:

But I think the name "Strawberry" is quite cute.

Pat Grady:

I think "Thinking Mode" is also interesting. What are you most excited about o2 or o3?

Ilge Akkaya:

We haven't reached a point where we have no ideas, so I look forward to the next developments. We will continue to research, and what we most look forward to is feedback. As researchers, we obviously have some bias in our own field of expertise, but through the use of the product, we will receive feedback from various different fields. Perhaps we will discover some areas worth further exploration beyond our imagination Article Source: Link 新 Newin Title: "OpenAI o1 Model Research Team Talks to Sequoia U.S. Partner: Dimensions of the o1 Series Yet to Be Fully Explored, The Ceiling Is Much Higher Than Many Imagine"