This is incredible.

o3 and o4-mini are massive leaps towards a versatile general purpose AI.
This is insane — the model intelligently knew exactly what the person wrote here — actually figured out it was upside down and rotated it.

They are taking things to a whole new level with complex multimodal reasoning.
This one is even more insane — it easily solved a complicated maze and accurately drew the path it took from start to finish.

With perfectly accurate code to draw the path.

Multimodal reasoning is a major step towards an AI that could understand and interact with the entire virtual or physical world in every possible way.
Imagine how much more powerful it would be when they start thinking with audio and video.
It’s a major step towards a general purpose AI that can work with any kind of data in any situation.
o3: Powerful multimodal reasoning model — deeper analysis, problem-solving, decision-making.
o4-mini: Smaller sibling of o4 — efficient but still pretty impressive.
The possibilities are endless:
- Solve complex visual puzzles — like we saw for the maze
- Navigate charts, graphs, and infographics
- Perform spatial and logical reasoning grounded in visuals
- Blend information from images and text to make better decisions
Multimodal reasoning AI isn’t just gonna write code or help you decide where to go on your next holiday.
It’ll be able to work directly with:
- Blueprints and maps
- Body language
- Scientific diagrams
This will be huge for AIs that interact with the physical world.
Imagine your personal AI assistant that could infer your desires without you even having to tell it anything.
Now we mostly talk to assistants in just text format…
But with multimodal AI’s they could use input from so many other things other than the words you’re actually saying:
- The tone of your voice (audio)
- Your facial expression (visual)
- Your body language (visual)
And of course still using context from your previous messages and conversations.
It could understand you at such a deep level and give ultra-personalized suggestions for whatever you ask.