
Image by Emilinao Vittoriosi, from Unsplash
OpenAI’s New AI Models Can Now “Think” With Images
OpenAI has launched o3 and o4-mini, advanced AI models that combine image manipulation with text-based reasoning to solve complex problems.
In a rush? Here are the quick facts:
- These models manipulate, crop, and transform images to solve complex tasks.
- o3 and o4-mini outperform earlier models in STEM questions, visual search, and chart reading.
- The models combine text and image processing, using tools like web search and code analysis.
OpenAI has announced two new AI models, o3 and o4-mini, that can reason with images—marking a major leap in how artificial intelligence understands and processes visual information.
“These systems can manipulate, crop and transform images in service of the task you want to do,” said Marc Chen, OpenAI’s head of research, during a livestream event on Wednesday, as reported by the New York Times.
The o3 and o4-mini models now have the ability to analyze images as part of their internal thinking process, whereas previous models could only see images.
The system enables users to upload photos of math problems, technical diagrams, handwritten notes, posters, and blurry or rotated images. It will break down the content into step-by-step explanations, regardless of multiple questions or visual elements in one image.
The system can now focus on unclear parts of an image, rotating it for better understanding. It combines visual understanding with text-based reasoning to deliver precise answers. The system can interpret science graphs to explain their meaning and identify coding errors in screenshots to generate solutions.
The models can also use other tools like web search, Python code, and image generation in real time, which allows them to solve much more complex tasks than before. OpenAI says these capabilities come built-in, without needing extra specialized models.
Tests show that o3 and o4-mini perform better than previous models in all visual tasks they were given. The visual search benchmark, known as V*, shows o3 reaching 95.7% accuracy. However, the models still have some flaws, as OpenAI states they can produce overthinking mistakes and basic perception errors.
OpenAI introduced this update as part of its initiative to develop AI systems that reason similarly to humans. The models require extensive thought sequences to function, which means they need extra time to handle complex questions. They also integrate tools like image generation, web search, and Python code analysis to give more precise and creative answers.
However, there are limits. The models sometimes process excessive amounts of information, make perception errors, and shift their reasoning approaches between attempts. The company is working to improve the models’ reliability and consistency.
Both o3 and o4-mini are now available to ChatGPT Plus ($20/month) and Pro ($200/month) users. OpenAI also released Codex CLI, a new open-source tool to help developers run these AI models alongside their own code.
While OpenAI faces legal challenges over content use, its visual reasoning tech shows how AI is getting closer to solving real-world problems in more human-like ways.
Leave a Comment
Cancel