On Tuesday, Meta is internet hosting its first-ever LlamaCon AI developer convention at its Menlo Park headquarters, the place the corporate will attempt to pitch builders on constructing purposes with its open Llama AI fashions. Just a yr in the past, that wasn’t a tough promote.
However, in current months, Meta has struggled to maintain up with each “open” AI labs like DeepSeek and closed business rivals akin to OpenAI within the quickly evolving AI race. LlamaCon comes at a essential second for Meta in its quest to construct a sprawling Llama ecosystem.
Winning builders over could also be so simple as transport higher open fashions. But which may be harder to realize than it sounds.
A promising early begin
Meta’s launch of Llama 4 earlier this month underwhelmed builders, with various benchmark scores coming in beneath fashions like DeepSeek’s R1 and V3. It was a far cry from what Llama as soon as was: a boundary-pushing mannequin lineup.
When Meta launched its Llama 3.1 405B mannequin final summer time, CEO Mark Zuckerberg touted it as an enormous win. In a weblog publish, Meta known as Llama 3.1 405B the “most succesful brazenly out there basis mannequin,” with efficiency rivaling OpenAI’s finest mannequin on the time, GPT-4o.
It was a formidable mannequin, to make certain — and so have been the opposite fashions in Meta’s Llama 3 household. Jeremy Nixon, who has hosted hackathons at San Francisco’s AGI House for the final a number of years, known as the Llama 3 launches “historic moments.”
Llama 3 arguably made Meta a darling amongst AI builders, delivering cutting-edge efficiency with the liberty to host the fashions wherever they selected. Today, Meta’s Llama 3.3 mannequin is downloaded extra usually than Llama 4, mentioned Hugging Face’s head of product and development, Jeff Boudier, in an interview.
Contrast that with the reception to Meta’s Llama 4 household, and the distinction is stark. But Llama 4 was controversial from the beginning.
Benchmarking shenanigans
Meta optimized a model of certainly one of its Llama 4 fashions, Llama 4 Maverick, for “conversationality,” which helped it nab a high spot on the crowdsourced benchmark LM Arena. Meta by no means launched this mannequin, nonetheless — the model of Maverick that rolled out broadly ended up performing a lot worse on LM Arena.
The group behind LM Arena mentioned that Meta ought to have been “clearer” concerning the discrepancy. Ion Stoica, an LM Arena co-founder and UC Berkeley professor who has additionally co-founded firms together with Anyscale and Databricks, advised TechCrunch that the incident harmed the developer group’s belief in Meta.
“[Meta] ought to have been extra express that the Maverick mannequin that was on [LM Arena] was totally different from the mannequin that was launched,” Stoica advised TechCrunch in an interview. “When this occurs, it’s somewhat little bit of a lack of belief with the group. Of course, they’ll recuperate that by releasing higher fashions.”
No reasoning
A obtrusive omission from the Llama 4 household was an AI reasoning mannequin. Reasoning fashions can work rigorously by questions earlier than answering them. In the final yr, a lot of the AI business has launched reasoning fashions, which are inclined to carry out higher on particular benchmarks.
Meta’s teasing a Llama 4 reasoning mannequin, however the firm hasn’t indicated when to anticipate it.
Nathan Lambert, a researcher with Ai2, says the truth that Meta didn’t launch a reasoning mannequin with Llama 4 suggests the corporate could have rushed the launch.
“Everyone’s releasing a reasoning mannequin, and it makes their fashions look so good,” Lambert mentioned. “Why couldn’t [Meta] wait to try this? I don’t have the reply to that query. It looks like regular firm weirdness.”
Lambert famous that rival open fashions are nearer to the frontier than ever earlier than, and that they now come in additional styles and sizes — tremendously rising the stress on Meta. For instance, on Monday, Alibaba launched a set of fashions, Qwen 3, which allegedly outperform a few of OpenAI and Google’s finest coding fashions on Codeforces, a programming benchmark.
To regain the open mannequin lead, Meta merely must ship superior fashions, in response to Ravid Shwartz-Ziv, an AI researcher at NYU’s Center for Data Science. That could contain taking extra dangers, like using new methods, he advised TechCrunch.
Whether Meta is able to take huge dangers proper now’s unclear. Current and former workers beforehand advised Fortune Meta’s AI analysis lab is “dying a sluggish demise.” The firm’s VP of AI Research, Joelle Pineau, introduced this month that she was leaving.
LlamaCon is Meta’s probability to point out what it’s been cooking to beat upcoming releases from AI labs like OpenAI, Google, xAI, and others. If it fails to ship, the corporate might fall even additional behind within the ultra-competitive area.