From bigger to more thoughtful

NASA astrophysicist Dr Kyle Kabasares was shocked.

Kabasares had spent a year working on almost 1000 lines of Python code to research the size of black holes.

Now, after under two minutes of quick prompting, the new o1 model from OpenAI was able to generate the same fully functioning code.

It feels like we’ve stepped into a new area in AI again.

A new reasoning model

Last week, OpenAI released its newest model - the o1. The model significantly improves thinking through complex problems with better reasoning.

OpenAI considered the release such a step-change in models that it re-started its still clunky naming convention.

In this article, I’ll quickly break down what o1 is, why it matters, and what are the potential implications for us working with design and innovations. 

What’s the big deal with the new o1 model?

Firstly, the results.

Still in its preview mode, the o1 has already achieved jaw-dropping feats like coding a fully functioning tetris in under 2 minutes (with one mistake) or writing the Python code from a PhD paper for Dr Kabasares.

Matthew Berman successfully asked o1 to code a fully working version of Tetris in Python in under 2 minutes with one correction.

Beyond anecdotes, the o1 is blowing away GPT-4o in key benchmarks measuring areas like math, code or PhD-level science questions. It also achieved a considerable bump in performance in areas like professional law and global facts (figures from OpenAI below).

The new o1 model clearly outperforms GPT-4o in several key benchmarks - most notably in STEM areas like math, physics and coding. Notably, it hasn’t improved in English language over GPT-4o.

To summarize, o1 is performing on a completely new level in domains requiring logical reasoning and that have a single right or wrong answer - think coding, math, or physics.  

The second noteworthy thing about o1 is how it’s achieving its results.

From scaling size to scaling thinking

Until now, most of the Large Language Model improvements have stemmed from sheer size - larger training sets and larger model parameter sizes (the number of weights it contains to make predictions). 

With o1, the team at Open AI worked to improve its thinking.

The model uses Chain of Thought steps to break down larger tasks into smaller logical steps. It’s able to “reason” through a more complex sequence of steps and self-correct along the way.

Asking the LLM to think “step by step” has already been a well-researched technique to improve the output quality with previous models. Now, Open AI has baked this approach into the model itself. It shows the user an abstracted and censored version of the thinking process. 

The bottom line?

Generative AI models now have a second dimension to improve in - not just by getting bigger but by being smarter at how they think about problems. (Figure on the right below)

The performance of o1 improves both with how long its trained (on the left) and how long it’s able to think (on the right). Source: OpenAI 

The o1 model is still based on the training of the previous GPT-4o model. Adding advanced reasoning to the upcoming GPT-5 (code-named Orion) will look to take LLM capabilities to a whole new level. 

What does this mean for us?

What does this all mean for us using generative AI in design and innovation work?

I think there are three main takeaways:

1. Work across multiple AI models 

OpenAI has two different frontier models out now - GPT-4o and o1 Preview. Paid users can access both of them.

For most daily LLM-supported tasks, GPT-4o will be a solid option. It’s fast and efficient. Interestingly, it’s also better at creative writing tasks than o1.

For tasks requiring complex logical thinking and accuracy, o1 will be your best bet. Tasks involving data analysis, coding, math, or multi-stepped reasoning are better to be handed over to the o1.

2. Explore rapid prototyping

Coding with LLM’s just got much faster.

For anyone working within designing digital services, this means we’ll have the ability to prototype new things on a new timescale.

Especially people already familiar with development can take ideas into testable code in a fraction of the time required before. 

3. Re-examine assumptions about working with AI 

In my discussions and training sessions on working with AI, we’ve often discussed how current-gen AI models are fairly hit-and-miss at two things:

1. Getting answers to exact factual questions right
2. Working on complex assignments with multiple steps

Now, we’ll have to revisit these assumptions. The o1 model can reason through complex questions in multiple steps to arrive at correct answers.

Rumors also indicate that the reasoning provided by o1 is being used to train the upcoming GPT-5 model from Open AI.  

The release of o1 is likely to cause an arms race between other AI leaders to launch their own models with improved reasoning.

Again, it’s up to us to update our mental models of what AI can do and how we can add unique value to the mix as humans. 

Matias Vaara

I help teams tap into the power of generative AI for design and innovation.

My weekly newsletter, Amplified, shares practical insights on generative AI for design and innovation.

Previous
Previous

Beyond Generic Results with Generative AI

Next
Next

Gen AI in services beyond chatbots - 3 case examples