Japan's Sakana AI says Fugu Ultra matches Mythos on certain benchmarks
What are Fugu and Fugu Ultra, the AI systems through which Japan's Sakana AI is betting on 'One Model To Command Them All' and claiming Mythos-level performance on some benchmarks?
)
Sakana AI unveils Fugu Ultra, says it matches Mythos on tests (Image source: https://sakana.ai/)
Listen to This Article
Tokyo-based artificial intelligence (AI) startup Sakana AI on Tuesday launched Fugu and Fugu Ultra, a new generation of AI systems that rely on orchestrating multiple AI models rather than depending on a single large model. The company claims the approach allows its flagship Fugu Ultra system to achieve performance comparable to Anthropic’s Fable and Mythos models on certain benchmarks.
In a blog post announcing the release, Sakana AI described Fugu as “a full multi-agent orchestration system accessible via a single model API”. The company said the models “achieve superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models” and learn how to assemble and manage AI agents for different tasks rather than following predefined workflows.
According to Sakana AI, Fugu Ultra delivered results comparable to leading models on benchmarks including GPQA-D, LCBv6 and SWE-Pro. The company said its orchestration-based approach enables the system to coordinate specialised models and combine their outputs into a single response.
The company was founded in 2023 by Llion Jones, a co-author of Google's "Attention Is All You Need" paper, and former Stability AI research head David Ha.
How did Fugu reach Mythos-level performance?
Unlike GPT-5.5, Gemini 3.1 Pro or Anthropic's Mythos, Fugu, which itself is a language model trained to coordinate other AI models, is not designed to handle every task on its own. Instead, it acts as an orchestrator that determines which model is best suited for a particular task and, in some cases, coordinates multiple models to work together.
Also Read
Fugu Ultra takes the idea further by breaking a problem into smaller tasks, assigning them to different models, and then combining the results into a final response.
Sakana argues that frontier models excel in different areas. One may be stronger at coding, another at scientific reasoning, and another at mathematics. Rather than relying on a single model, Fugu attempts to combine those strengths.
READ | Recursive self-improvement explained: Is AI building AI the path to AGI?
Sakana compares the approach to collective intelligence. In its technical report, the company says the orchestrator learns when to involve particular models, how they should communicate, and how their outputs should be synthesised. The goal is to produce a result that is stronger than what any one model could achieve on its own.
According to Sakana, this strategy helped Fugu Ultra achieve scores comparable to Anthropic's Mythos Preview and Fable 5 on several benchmarks, including GPQA-Diamond, CharXiv Reasoning and Terminal Bench. While it did not outperform those models on every test, the company says the results suggest that coordinating existing frontier models can sometimes rival the performance of a single cutting-edge AI system.
Sakana’s big claim sparks debate online
The launch has sparked discussion within the AI community because it shifts the focus from building larger standalone models to coordinating existing systems.
Aaron Levie, co-founder and chief executive of Box, wrote on X, “Another new idea to push the state of AI architectures forward. Sakana released a model that effectively uses a mixture of models to get work done. You get a single API but then the work gets farmed out the model that best performs the task.”
Levie added, “This is generally how applied AI products are building their agent harnesses at this point, but the idea of making this an LLM that any developer can interact with is also a great idea. As we get more innovation with both frontier closed and OSS models, there’s going to be a ton of value produced for the layer that can route the best.”
Not all early users were convinced by the performance claims. Ethan Mollick, an associate professor at the Wharton School and AI researcher, said on X that he had tested the system and found it slow in practice.
READ | Call screening to in-call agent: Can voice be AI's next growth frontier?
“I have been trying Sakana Fugu Ultra-high and, first, it is incredibly slow: my typical coding tests (shaders, interactive scenes) take 30 minutes to run. And the results are... fine. It does not match Fable in real use,” Mollick wrote.
Crypto investor and internet personality Miles Deutscher described the launch as a significant development in AI architecture.
“This is insane. We just got ANOTHER Mythos-level intelligence LLM. This model operates like no other AI we've ever seen, and it's actually mind-blowing. Fugu isn't just one model. It's a model trained to orchestrate OTHER models,” he said on X.
Explaining the system, Deutscher added that Fugu determines when to delegate tasks, how AI agents should communicate and how their outputs should be combined before producing a final answer. He also noted that Fugu Ultra scored 73.7 on SWE-Bench Pro and performs at a level comparable to Fable 5.
Bengaluru-based software engineer and AI commentator Rohan Paul talked about the system’s performance in a coding test at length while also pointing to its cost.
“Sakana Fugu Ultra just beat the other models on visual polish in a live trading-desk coding test, got close to GLM 5.2, but at 17x the cost,” Paul wrote on X.
He added that Fugu Ultra functions as “an orchestration layer that assembles and routes subtasks across a pool of models through one OpenAI-compatible endpoint” and that the system decides whether to answer directly or distribute parts of a task to other models before compiling a final response.
More From This Section
Don't miss the most important news and views of the day. Get them on our Telegram channel
First Published: Jun 23 2026 | 2:32 PM IST
