Breaking News: OpenAI o1

OpenAI’s O1 Model Launched: A New Milestone in AGI!

OpenAI’s “Strawberry”—the o1 series model—has been launched unexpectedly and with great speed! It has reached new heights in complex reasoning, mathematics, and coding problems, shattering our preconceptions of LLM capabilities. This work, pioneered by Ilya, introduces a new Scaling Law.

OpenAI o1

Just now, 11Sep2024, OpenAI’s most powerful o1 series model went live. Without any warning, OpenAI dropped this bombshell.

The much-anticipated “Strawberry” model, rumored to launch in two weeks, arrived in just two days!

OpenAI o1-preview

Starting today, O1-preview will be available to all Plus and Team users in ChatGPT and to Tier 5 developers via the API.

Additionally, OpenAI released O1-mini—a cost-effective reasoning model that excels in STEM, particularly in mathematics and coding.

The new 01 series has achieved a new level of performance in complex reasoning, showing true general reasoning abilities.

In a series of benchmark tests, o1 showed significant improvements over GPT-4o, achieving gold medal-level abilities in math competitions and surpassing human PhD levels in physics, biology, and chemistry problems!

OpenAI researcher Jason Wei called o1-mini the most surprising research outcome he’s seen in the past year. This small model managed to score above 60% in the AIME math competition.

However, according to the appendices in OpenAI’s article, the preview and mini versions released this time seem to be “trimmed-down” versions of the full o1.

New Paradigm in Reasoning Scaling Begins

NVIDIA Senior Scientist Jim Fan provided further insights into the principles behind the O1 model.

He noted that a new paradigm for reasoning time scaling is being widely adopted and deployed. As Sutton pointed out in “The Bitter Lesson,” only two techniques can infinitely scale computing power: learning and search.

Now, it’s time to focus on the latter.

More Inference
  1. Performing reasoning does not require massive models.
  2. Significant computation is shifting from pre-training/post-training to inference services.
  3. OpenAI likely discovered the reasoning scaling laws early on, while the academic community has only recently begun to notice.
  4. Deploying O1 in practical applications is far more challenging than achieving good results in academic benchmarks.
  5. “Strawberry” could easily become a data flywheel.

According to OpenAI’s previous classification, o1 has already achieved Level 2 reasoning ability.

Testing revealed that o1 successfully wrote a very difficult poem, demonstrating the complex planning and thinking needed to complete the task. The reasoning time calculation was impressively efficient.

However, AI expert Karpathy complained after testing o1-mini, saying, “It still refuses to solve the Riemann hypothesis for me. The model’s laziness remains a major issue, which is truly disappointing.”

NYU Assistant Professor Xie Saining also tested the classic question “Which is greater, 9.11 or 9.8?” and found that o1-preview still got it wrong.

For the classic tricky question “How many ‘r’s are in ‘strawberry’?”, o1 naturally had no problem.

Prominent figure Matthew Sabia noted that the scariest part is that GPT-5 is said to be 69 times more powerful than the o1 model. Most people simply do not understand the reasoning and logical capabilities of such advanced systems.

Are humans truly ready?

OpenAI o1-mini

To provide developers with more efficient solutions, OpenAI has released o1-mini—a faster and cheaper reasoning model. As a smaller model, o1-mini is 80% cheaper than o1-preview.

For applications that require reasoning but not general world knowledge, it is a powerful and cost-effective option. However, the current o1 series is still in its early stages, and features like web plugins, file uploads, and image processing have not yet been integrated. In the short term, GPT-4o remains the strongest contender.