We review vendors based on rigorous testing and research, and also take into account your feedback and our affiliate commission with providers. Some providers are owned by our parent company.

Learn more

Advertising Disclosure

OpenAI’s New AI Models Can Now “Think” With Images

Image by Emilinao Vittoriosi, from Unsplash

OpenAI’s New AI Models Can Now “Think” With Images

Reading time: 3 min

Last Updated: Apr 17, 2025

Written by Kiara Fabbri Multimedia Journalist
Fact-Checked by Sarah Frazier Content Manager

OpenAI has launched o3 and o4-mini, advanced AI models that combine image manipulation with text-based reasoning to solve complex problems.

In a rush? Here are the quick facts:

These models manipulate, crop, and transform images to solve complex tasks.
o3 and o4-mini outperform earlier models in STEM questions, visual search, and chart reading.
The models combine text and image processing, using tools like web search and code analysis.

OpenAI has announced two new AI models, o3 and o4-mini, that can reason with images—marking a major leap in how artificial intelligence understands and processes visual information.

“These systems can manipulate, crop and transform images in service of the task you want to do,” said Marc Chen, OpenAI’s head of research, during a livestream event on Wednesday, as reported by the New York Times.

The o3 and o4-mini models now have the ability to analyze images as part of their internal thinking process, whereas previous models could only see images.

The system enables users to upload photos of math problems, technical diagrams, handwritten notes, posters, and blurry or rotated images. It will break down the content into step-by-step explanations, regardless of multiple questions or visual elements in one image.

The system can now focus on unclear parts of an image, rotating it for better understanding. It combines visual understanding with text-based reasoning to deliver precise answers. The system can interpret science graphs to explain their meaning and identify coding errors in screenshots to generate solutions.

The models can also use other tools like web search, Python code, and image generation in real time, which allows them to solve much more complex tasks than before. OpenAI says these capabilities come built-in, without needing extra specialized models.

Tests show that o3 and o4-mini perform better than previous models in all visual tasks they were given. The visual search benchmark, known as V*, shows o3 reaching 95.7% accuracy. However, the models still have some flaws, as OpenAI states they can produce overthinking mistakes and basic perception errors.

OpenAI introduced this update as part of its initiative to develop AI systems that reason similarly to humans. The models require extensive thought sequences to function, which means they need extra time to handle complex questions. They also integrate tools like image generation, web search, and Python code analysis to give more precise and creative answers.

However, there are limits. The models sometimes process excessive amounts of information, make perception errors, and shift their reasoning approaches between attempts. The company is working to improve the models’ reliability and consistency.

Both o3 and o4-mini are now available to ChatGPT Plus ($20/month) and Pro ($200/month) users. OpenAI also released Codex CLI, a new open-source tool to help developers run these AI models alongside their own code.

While OpenAI faces legal challenges over content use, its visual reasoning tech shows how AI is getting closer to solving real-world problems in more human-like ways.

Kiara Fabbri

Written by: Kiara Fabbri

Hi, I’m Kiara Fabbri a Tech News Writer at WizCase. I'm a multimedia journalist with a keen interest in innovative and immersive news storytelling. Fluent in three languages—Italian, English, and Spanish—I'm deeply engaged in all facets of news reporting. I am currently undertaking a Ph.D. exploring VR applications in journalism. Following my studies in psychology (BSc) and political psychology (MSc), I embarked on a three-year journey across South America. During this time, I undertook a 4000 km solo bicycle trip from Chile to Brazil. There, I camped, cooked on the road, and volunteered in various facilities. It was during these travels that I discovered my passion for journalism. Subsequently, I pursued a Master's program in journalism innovation and enterprise. My career in journalism has seen me produce VR immersive experiences covering a range of topics. These include documenting an Anti Militaristic raid of a NATO Base, life in the Brazilian Favelas, the recent violent protest clashes in Buenos Aires and finally social projects run by Skaters in the Argentinian ghettos. In addition to my journalistic endeavours, I also practice parkour, a discipline I have cultivated over several years now. I love to make the most of this skill in my journalistic style; for example, it has enabled me to capture dynamic footage from heights during the recent violent clashes in Buenos Aires. There, I scaled heights and employed a long stick on my camera to create aerial 360° views. Through my dedication to pushing the boundaries of journalism, I am committed to amplifying voices, sparking meaningful conversations, and driving positive change in the world.

Did you like this article? Rate it!

I hated it I don't really like it It was ok Pretty good! Loved it!

We're thrilled you enjoyed our work!

As a valued reader, would you mind giving us a shoutout on Trustpilot? It's quick and means the world to us. Thank you for being amazing!

Rate us on Trustpilot

0 Voted by 0 users

Title

Your email Please enter a valid email address.

Comment

Thanks for your feedback

Please wait 5 minutes before posting another comment.

Comment sent for approval.

Leave a Comment

Show more...

Share & Support

WizCase is reader-supported so we may receive a commission when you buy through links on our site. You do not pay extra for anything you buy on our site — our commission comes directly from the product owner. Some providers are owned by our parent company. Learn moreWizcase was established in 2018 as an independent site reviewing VPN services and covering privacy-related stories. Today, our team of hundreds of cybersecurity researchers, writers, and editors continues to help readers fight for their online freedom in partnership with Kape Technologies PLC, which also owns the following products: ExpressVPN, CyberGhost, Intego and Private Internet Access which may be ranked and reviewed on this website. The reviews published on Wizcase are believed to be accurate as of the date of each article, and written according to our strict reviewing standards that prioritize the independent, professional and honest examination of the reviewer, taking into account the technical capabilities and qualities of the product together with its commercial value for users. The rankings and reviews we publish may also take into consideration the common ownership mentioned above, and affiliate commissions we earn for purchases through links on our website. We do not review all VPN providers and information is believed to be accurate as of the date of each article..