OpenAI has launched o3-pro, an AI mannequin that the corporate claims is its most succesful but.
O3-pro is a model of OpenAI’s o3, a reasoning mannequin that the startup launched earlier this yr. As opposed to traditional AI fashions, reasoning fashions work by means of issues step-by-step, enabling them to carry out extra reliably in domains like physics, math, and coding.
O3-pro is offered for ChatGPT Pro and Team customers beginning Tuesday, changing the o1-pro mannequin. Enterprise and Edu customers will get entry the week after, OpenAI says. O3-pro can be reside in OpenAI’s developer API as of this afternoon.
O3-pro is priced at $20 per million enter tokens and $80 per million output tokens within the API. Input tokens are tokens fed into the mannequin, whereas output tokens are tokens that the mannequin generates based mostly on the enter tokens.
One million enter tokens is equal to about 750,000 phrases, a bit longer than “War and Peace.”
“In skilled evaluations, reviewers constantly choose o3-pro over o3 in each examined class and particularly in key domains like science, schooling, programming, enterprise, and writing assist,” OpenAI writes in a changelog. “Reviewers additionally rated o3-pro constantly increased for readability, comprehensiveness, instruction-following, and accuracy.”
O3-pro has entry to instruments, in keeping with OpenAI, permitting it to look the online, analyze information, motive about visible inputs, use Python, personalize its responses leveraging reminiscence, and extra. As a downside, the mannequin’s responses sometimes take longer than o1-pro to finish, in keeping with OpenAI.
O3-pro has different limitations. Temporary chats with the mannequin in ChatGPT are disabled for now whereas OpenAI resolves a “technical concern.” O3-pro can’t generate photographs. And Canvas, OpenAI’s AI-powered workspace characteristic, isn’t supported by o3-pro.
On the plus facet, o3-pro achieves spectacular scores in standard AI benchmarks, in keeping with OpenAI’s inner testing. On AIME 2024, which evaluates a mannequin’s math abilities, o3-pro scores higher than Google’s top-performing AI mannequin, Gemini 2.5 Pro. O3-pro additionally beats Anthropic’s lately launched Claude 4 Opus on GPQA Diamond, a check of PhD-level science information.