On Tuesday, OpenAI launched new instruments designed to assist builders and enterprises construct AI brokers – automated methods that may independently accomplish duties – utilizing the corporate’s personal AI fashions and frameworks.
The instruments are a part of OpenAI’s new Responses API, which lets companies develop customized AI brokers that may carry out internet searches, scan by way of firm recordsdata, and navigate web sites, very similar to OpenAI’s Operator product. The Responses API successfully replaces OpenAI’s Assistants API, which the corporate plans to sundown within the first half of 2026.
The hype round AI brokers has grown dramatically in recent times even if the tech business has struggled to indicate individuals, and even outline, what “AI brokers” actually are. In the newest instance of agent hype operating forward of utility, Chinese startup Butterfly Effect earlier this week went viral for a brand new AI agent platform referred to as Manus that customers shortly found didn’t ship on lots of the firm’s guarantees.
In different phrases, the stakes are excessive for OpenAI to get brokers proper.
“It’s fairly simple to demo your agent,” Olivier Godemont, OpenAI’s API product head, advised TechCrunch in an interview. “To scale an agent is fairly exhausting, and to get individuals to make use of it typically could be very exhausting.”
Earlier this 12 months, OpenAI launched two AI brokers in ChatGPT: Operator, which navigates web sites in your behalf, and deep analysis, which compiles analysis reviews for you. Both instruments provided a glimpse at what agentic know-how can obtain, however left fairly a bit to be desired within the “autonomy” division.
Now with the Responses API, OpenAI needs to promote entry to the parts that energy AI brokers, permitting builders to construct their very own Operator- and deep research-style agentic purposes. OpenAI hopes that builders can create some purposes with its agent know-how that really feel extra autonomous than what’s out there immediately.
Using the Responses API, builders can faucet the identical AI fashions (in preview) below the hood of OpenAI’s ChatGPT Search internet search software: GPT-4o search and GPT-4o mini search. The fashions can browse the online for solutions to questions, citing sources as they generate replies.
OpenAI claims that GPT-4o search and GPT-4o mini search are extremely factually correct. On the corporate’s SimpleQA benchmark, which measures the flexibility of fashions to reply quick, fact-seeking questions, GPT-4o search scores 90% whereas GPT-4o mini search scores 88% (larger is best). For comparability, GPT-4.5 – OpenAI’s a lot bigger, not too long ago launched mannequin – scores simply 63%.
The proven fact that AI-powered search instruments are extra correct than conventional AI fashions shouldn’t be essentially stunning – in concept, GPT-4o search can simply search for the proper reply. However, internet search doesn’t render hallucinations a solved downside. Beyond their factual accuracy, AI search instruments additionally are inclined to battle with quick, navigational queries (similar to “Lakers rating immediately”), and up to date reviews counsel that ChatGPT’s citations aren’t all the time dependable.
The Responses API additionally features a file search utility that may shortly scan throughout recordsdata in an organization’s databases to retrieve info. (OpenAI claims that it gained’t practice fashions on these recordsdata.) In addition, builders utilizing the Responses API can faucet OpenAI’s Computer-Using Agent (CUA) mannequin, which powers Operator. The mannequin generates mouse and keyboard actions, permitting builders to automate pc use duties like information entry and app workflows.
Enterprises can optionally run the CUA mannequin, which is releasing in analysis preview, regionally on their very own methods, OpenAI stated. The client model of the CUA out there in Operator can solely take actions on the net.
To be clear, the Responses API gained’t remedy all of the technical issues plaguing AI brokers immediately.
While AI-powered search instruments are extra correct than conventional AI fashions – a reality that’s unsurprising given they’ll simply search for the proper reply – internet search doesn’t render AI hallucinations a solved downside. GPT-4o search nonetheless will get 10% of factual questions mistaken. Beyond their accuracy, AI search instruments additionally are inclined to battle with quick, navigational queries (similar to “Lakers rating immediately”), and up to date reviews counsel that ChatGPT’s citations aren’t all the time dependable.
In a weblog put up offered to TechCrunch, OpenAI stated that the CUA mannequin is “not but extremely dependable for automating duties on working methods,” and that it’s inclined to creating “inadvertent” errors.
However, OpenAI stated these are early iterations of their agent instruments, and it’s always working to enhance them.
Alongside the Responses API, OpenAI is releasing an open-source toolkit referred to as the Agents SDK, which presents builders free instruments to combine fashions with their inside methods, put in place safeguards, and monitor AI agent actions for debugging and optimization functions. The Agents SDK is a follow-up of types to OpenAI’s Swarm, a framework for multi-agent orchestration that the corporate launched late final 12 months.
Godemont stated he hopes OpenAI can bridge the hole between AI agent demos and merchandise this 12 months, and that, in his opinion, “brokers are probably the most impactful software of AI that may occur.” That echoes a proclamation OpenAI CEO Sam Altman made in January: that 2025 is the 12 months AI brokers enter the workforce.
Whether or not 2025 actually turns into the “12 months of the AI agent,” OpenAI’s newest releases present the corporate needs to shift from flashy agent demos to impactful instruments.