Sakana AI

Models

Fugu

Fugu balances strong performance with low latency, making it the ideal default for everyday work. Fugu, will route to the best model based on the task at hand. You can also opt specific agents out of its pool to meet data, privacy, and compliance constraints.

Fugu Ultra

Fugu Ultra coordinates a deeper pool of expert agents to maximize answer quality on hard, high-stakes problems. Fugu Ultra focuses on maximizing performance, for a higher cost. It routes between one to three agents, depending on the problem.

Benchmark comparison

Reported benchmark scores across Fugu and other frontier models.

Benchmark comparison chart for Fugu models
BenchmarkFugu-UltraFuguOpus 4.8Gemini 3.1GPT-5.5
SWE Bench Pro73.759.069.254.258.6
Terminal Bench 2.182.180.274.670.378.2
LiveCodeBench93.292.987.888.585.3
LiveCodeBench Pro90.887.884.882.988.4
Humanity’s Last Exam50.047.249.844.441.4
CharXiv Reasoning86.685.184.283.384.1
GPQA Diamond95.595.592.094.393.6
SciCode58.760.153.558.956.1
τ3 Banking20.621.720.68.420.6
Long Context Reasoning73.374.767.772.774.3
MRCRv293.686.687.984.994.8
CTI-REALM69.467.569.656.067.3

Example Usage

These examples use the OpenAI-compatible Responses API. Set your environment variables once, then copy and run the Python or cURL example you need.

# pip install openai
import os
from openai import OpenAI

base_url = os.environ["FUGU_BASE_URL"].rstrip("/")
if not base_url.endswith("/v1"):
    base_url = f"{base_url}/v1"

client = OpenAI(
    api_key=os.environ["FUGU_API_KEY"],
    base_url=base_url,
)

with client.responses.stream(
    model="fugu-ultra",
    input="Explain why streaming is useful in three short bullets.",
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)

    # Full response object assembled from the stream.
    response = stream.get_final_response()

print()
print(response)

Supported endpoints

Fugu currently supports the OpenAI-compatible Chat Completions, Responses, and Models APIs. For generation requests, we strongly recommend using the Responses API, for better performance.

Responses API

Use /v1/responses when you want the Responses API shape, especially for tool use, multimodal input, and proper reasoning or function calls management.

For more details, see the OpenAI Responses API reference.

Supported request fields

FieldTypeDescription
modelstringRequired. Model ID to use, such as fugu orfugu-ultra.
inputstring | arrayRequired. A string or an array of Responses-style input items. A string input is treated as a user message.
instructionsstringSystem / developer message passed to the model.
metadataobjectArbitrary key-value pairs attached to the request.
streambooleanStream the response.
max_output_tokensnumberMaximum number of tokens to generate. Note for fugu-ultra, this is applied only to the final model response, orchestrator model still uses maximum token limit.
reasoningobjectReasoning controls with an effort value ofhigh, xhigh, or max. xhigh and max are aliases of the same reasoning effort. Default is xhigh.
toolsarrayTool definitions the model may call.
tool_choicestring | objectControls whether and which tool is called.
text.formatobjectStructured output with text,json_object, or json_schema.
temperaturenumberAccepted but ignored.
parallel_tool_callsbooleanAccepted but ignored. Set to True on the server side for models that support it.
previous_response_idstringNot accepted. Send the full conversation history directly in input instead.

Chat Completions

Use /v1/chat/completions when you want the OpenAI Chat Completions API shape.

For more details, see the OpenAI Chat Completions API reference.

Supported request fields

FieldTypeDescription
modelstringRequired. Model ID to use, such as fugu orfugu-ultra.
messagesarrayRequired. An array of chat messages with a role (such as system, developer, user, assistant, or tool) and content. User message content can be a string or an array of content parts, depending on the selected model.
metadataobjectArbitrary key-value pairs attached to the request.
streambooleanStream the response.
stream_optionsobjectStreaming options such as include_usage. Only set this when stream is true.
max_completion_tokensnumberMaximum number of tokens to generate. Note for fugu-ultra, this is applied only to the final model response; the orchestrator model still uses its maximum token limit.
max_tokensnumberLegacy token limit field. Prefer max_completion_tokens for new integrations.
reasoning_effort / reasoningstring | objectReasoning controls. Use reasoning_effort or a reasoning object with an effort value of high, xhigh, or max. xhigh and max are aliases of the same reasoning effort. Default is high.
toolsarrayTool definitions the model may call.
tool_choicestring | objectControls whether and which tool is called.
response_formatobjectStructured output with text,json_object, or json_schema.
top_pnumberAccepted but ignored.
stopstring | arrayAccepted but ignored.
seednumberAccepted but ignored.
frequency_penaltynumberAccepted but ignored.
presence_penaltynumberAccepted but ignored.
temperaturenumberAccepted but ignored.
parallel_tool_callsbooleanAccepted but ignored. Always applied to models that support it.

Models API

Use /v1/models to list the models available through the API. This OpenAI-compatible endpoint returns the supported model IDs.

For more details, see the OpenAI Models API reference.

Supported models

Model IDDescription
fuguThe default Fugu model.
fugu-ultraThe Fugu Ultra model.
fugu-ultra-20260615A dated alias of fugu-ultra that pins a specific model version.

Built-in tools

Fugu models support OpenAI-compatible built-in tools in the Responses API. To let the model search the web, add the web_search tool to your request's tools array, just as you would with OpenAI models.

When web search is used, the response can include web search call output. Advance options for the tool are not supported. For more details see the OpenAI web search guide.

# pip install openai
import os
from openai import OpenAI

base_url = os.environ["FUGU_BASE_URL"].rstrip("/")
if not base_url.endswith("/v1"):
    base_url = f"{base_url}/v1"

client = OpenAI(
    api_key=os.environ["FUGU_API_KEY"],
    base_url=base_url,
)

response = client.responses.create(
    model="fugu",
    tools=[{"type": "web_search"}],
    input="Search the web for today's top AI news and summarize it with citations.",
)

print(response.output_text)

Usage fields for Fugu Ultra

Outside of standard Responses API output, Fugu Ultra returns usage fields that separate user-visible model work from orchestration work. Note that different to OpenAI, even though the orchestration tokens are stored in token_details fields. They represent real token usage outside of the input, output tokens, and will be counted in the final price of the request. The price will be the same as standard input and output tokens.

{
  "usage": {
    "input_tokens": 120,
    "output_tokens": 80,
    "total_tokens": 200,
    "input_tokens_details": {
      "cached_tokens": 0,
      "orchestration_input_tokens": 0,
      "orchestration_input_cached_tokens": 0
    },
    "output_tokens_details": {
      "orchestration_output_tokens": 0
    }
  }
}
FieldDescription
input_tokensTokens from the user input sent to the first model.
input_tokens_details.cached_tokensCached input tokens for the user input.
input_tokens_details.orchestration_input_tokensSum of all input tokens used for orchestration.
input_tokens_details.orchestration_input_cached_tokensCached input tokens from orchestration.
output_tokensTokens in the final output.
output_tokens_details.orchestration_output_tokensOutput tokens from the orchestration.
total_tokensTotal token count returned for the request (including orchestration).