OpenAI’s New AI Models Can Now “Think” With Images

Image by Emilinao Vittoriosi, from Unsplash

OpenAI’s New AI Models Can Now “Think” With Images

Reading time: 3 min

OpenAI has launched o3 and o4-mini, advanced AI models that combine image manipulation with text-based reasoning to solve complex problems.

In a rush? Here are the quick facts:

  • These models manipulate, crop, and transform images to solve complex tasks.
  • o3 and o4-mini outperform earlier models in STEM questions, visual search, and chart reading.
  • The models combine text and image processing, using tools like web search and code analysis.

OpenAI has announced two new AI models, o3 and o4-mini, that can reason with images—marking a major leap in how artificial intelligence understands and processes visual information.

“These systems can manipulate, crop and transform images in service of the task you want to do,” said Marc Chen, OpenAI’s head of research, during a livestream event on Wednesday, as reported by the New York Times.

The o3 and o4-mini models now have the ability to analyze images as part of their internal thinking process, whereas previous models could only see images.

The system enables users to upload photos of math problems, technical diagrams, handwritten notes, posters, and blurry or rotated images. It will break down the content into step-by-step explanations, regardless of multiple questions or visual elements in one image.

The system can now focus on unclear parts of an image, rotating it for better understanding. It combines visual understanding with text-based reasoning to deliver precise answers. The system can interpret science graphs to explain their meaning and identify coding errors in screenshots to generate solutions.

The models can also use other tools like web search, Python code, and image generation in real time, which allows them to solve much more complex tasks than before. OpenAI says these capabilities come built-in, without needing extra specialized models.

Tests show that o3 and o4-mini perform better than previous models in all visual tasks they were given. The visual search benchmark, known as  V*, shows o3 reaching 95.7% accuracy. However, the models still have some flaws, as OpenAI states they can produce overthinking mistakes and basic perception errors.

OpenAI introduced this update as part of its initiative to develop AI systems that reason similarly to humans. The models require extensive thought sequences to function, which means they need extra time to handle complex questions. They also integrate tools like image generation, web search, and Python code analysis to give more precise and creative answers.

However, there are limits. The models sometimes process excessive amounts of information, make perception errors, and shift their reasoning approaches between attempts. The company is working to improve the models’ reliability and consistency.

Both o3 and o4-mini are now available to ChatGPT Plus ($20/month) and Pro ($200/month) users. OpenAI also released Codex CLI, a new open-source tool to help developers run these AI models alongside their own code.

While OpenAI faces legal challenges over content use, its visual reasoning tech shows how AI is getting closer to solving real-world problems in more human-like ways.

Did you like this article? Rate it!
I hated it I don't really like it It was ok Pretty good! Loved it!

We're thrilled you enjoyed our work!

As a valued reader, would you mind giving us a shoutout on Trustpilot? It's quick and means the world to us. Thank you for being amazing!

Rate us on Trustpilot
0 Voted by 0 users
Title
Comment
Thanks for your feedback
Loader
Please wait 5 minutes before posting another comment.
Comment sent for approval.

Leave a Comment

Loader
Loader Show more...