OpenAI, underneath growing aggressive stress from Google and Anthropic, has debuted a brand new AI mannequin, GPT-5.2, that it says beats all current fashions by a considerable margin throughout a variety of duties.
The brand new mannequin, which is being launched lower than a month after OpenAI debuted its predecessor, GPT-5.1, carried out notably nicely on a benchmark of difficult skilled duties throughout a variety of “information work”—from regulation to accounting to finance—in addition to on evaluations involving coding and mathematical reasoning, in accordance with information OpenAI launched.
Fidji Simo, the previous InstaCart CEO who now serves as OpenAI’s CEO of functions, advised reporters that the mannequin mustn’t been seen as a direct response to Google’s Gemini 3 Professional AI mannequin, which was launched final month. That launch prompted OpenAI CEO Sam Altman to problem a “code purple,” delaying the rollout of a number of initiatives with a purpose to focus extra employees and computing assets on bettering its core product, ChatGPT.
“I might say that [the Code Red] helps with the discharge of this mannequin, however that’s not the explanation it’s popping out this week particularly, it has been within the works for some time,” she stated.
She stated the corporate had been constructing GPT-5.2 “for a lot of months.” “We don’t flip round these fashions in only a week. It’s the results of a number of work,” she stated. The mannequin had been identified internally by the code title “Garlic,” in accordance with a narrative in The Info. The day earlier than the mannequin’s launch Altman teased its imminent rollout by posting to social media a video clip of him cooking a dish with a considerable amount of garlic.
OpenAI executives stated that the mannequin had been within the palms of “Alpha prospects” who assist take a look at its efficiency for “a number of weeks”—a time interval that might imply the mannequin was accomplished previous to Altman’s “code purple” declaration.
These testers included authorized AI startup Harvey, note-taking app Notion, and file-management software program firm Field, in addition to Shopify and Zoom.
OpenAI stated these prospects discovered GPT-5.2 demonstrated a “cutting-edge” means to make use of different software program instruments to finish duties, in addition to excelling at writing and debugging code.
Coding has turn out to be probably the most aggressive use instances for AI mannequin deployment inside firms. Though OpenAI had an early lead within the house, Anthropic’s Claude mannequin has proved particularly widespread amongst enterprises, exceeding OpenAI’s marketshare in accordance with some figures. OpenAI is little doubt hoping to persuade prospects to show again to its fashions for coding with GPT-5.2.
Simo stated the “Code Crimson” was serving to OpenAI give attention to bettering ChatGPT. “Code Crimson can be a sign to the corporate that we wish to marshal assets in a single explicit space, and that’s a method to actually outline priorities and outline issues that may be deprioritized,” she stated. “So we have now had a rise in assets targeted on ChatGPT usually.”
The corporate additionally stated its new mannequin is best than the corporate’s earlier ones at offering “secure completions”—which it defines as offering customers with useful solutions whereas not saying issues that may contribute to or worsen psychological well being crises.
“On the protection aspect, as you noticed by means of the benchmarks, we’re bettering on just about each dimension of security, whether or not that’s self hurt, whether or not that’s several types of psychological well being, whether or not that’s emotional reliance,” Simo stated. “We’re very happy with the work that we’re doing right here. It’s a high precedence for us, and we solely launch fashions once we’re assured that the protection protocols have been adopted, and we really feel happy with our work.”
The discharge of the brand new mannequin got here on the identical day a brand new lawsuit was filed towards the corporate alleging that ChatGPT’s interactions with a psychologically troubled person had contributed to a murder-suicide in Connecticut. The corporate additionally faces a number of different lawsuits alleging ChatGPT contributed to individuals’s suicides. The corporate referred to as the Connecticut murder-suicide “extremely heartbreaking” and stated it’s persevering with to enhance “ChatGPT’s coaching to acknowledge and reply to indicators of psychological or emotional misery, de-escalate conversations and information individuals towards real-world help.”
GPT-5.2 confirmed a big bounce in efficiency throughout a number of benchmark assessments of curiosity to enterprise prospects. It met or exceeded human knowledgeable efficiency on a variety of adverse skilled duties, as measured by OpenAI’s GDPval benchmark, 70.9% of the time. That compares to simply 38.8% of the time for GPT-5, a mannequin that OpenAI launched in August; 59.6% for Anthropic’s Claude Opus 4.5; and 53.3% for Google’s Gemini 3 Professional.
On the software program growth benchmark, SWE-Bench Professional, GPT-5.2 scored 55.6%, which was nearly 5 proportion factors higher than its predecessor, GPT-5.1, and greater than 12% higher than Gemini 3 Professional.
OpenAI’s Aidan Clark, vp of analysis (coaching), declined to reply questions on precisely what coaching strategies had been used to improve GPT-5.2’s efficiency, though he stated that the corporate had made enhancements throughout the board, together with in pretraining, the preliminary step in creating an AI mannequin.
When Google launched its Gemini 3 Professional mannequin final month, its researchers additionally stated the corporate had made enhancements in pretraining in addition to post-training. This stunned some within the subject who believed that AI firms had largely exhausted the flexibility to wring substantial enhancements out of the pretraining stage of mannequin constructing, and it was speculated that OpenAI might have been caught off guard by Google’s progress on this space.

Leave a Reply