Files

YeonGyu-Kim d39891fcab docs: update hephaestus default model references from gpt-5.3-codex to gpt-5.4

Updated across README (all locales), docs/guide/, docs/reference/,
docs/examples/, AGENTS.md files, and test expectations/snapshots.

The deep category and multimodal-looker still use gpt-5.3-codex as
those are separate from the hephaestus agent.

2026-03-26 19:25:26 +09:00

17 KiB

Raw Blame History

Agent-Model Matching Guide

For agents and users: Why each agent needs a specific model — and how to customize without breaking things.

The Core Insight: Models Are Developers

Think of AI models as developers on a team. Each has a different brain, different personality, different strengths. A model isn't just "smarter" or "dumber." It thinks differently. Give the same instruction to Claude and GPT, and they'll interpret it in fundamentally different ways.

This isn't a bug. It's the foundation of the entire system.

Oh My OpenAgent assigns each agent a model that matches its working style — like building a team where each person is in the role that fits their personality.

Sisyphus: The Sociable Lead

Sisyphus is the developer who knows everyone, goes everywhere, and gets things done through communication and coordination. Talks to other agents, understands context across the whole codebase, delegates work intelligently, and codes well too. But deep, purely technical problems? He'll struggle a bit.

This is why Sisyphus uses Claude / Kimi / GLM. These models excel at:

Following complex, multi-step instructions (Sisyphus's prompt is ~1,100 lines)
Maintaining conversation flow across many tool calls
Understanding nuanced delegation and orchestration patterns
Producing well-structured, communicative output

Using Sisyphus with older GPT models would be like taking your best project manager — the one who coordinates everyone, runs standups, and keeps the whole team aligned — and sticking them in a room alone to debug a race condition. Wrong fit. GPT-5.4 now has a dedicated Sisyphus prompt path, but GPT is still not the default recommendation for the orchestrator.

Hephaestus: The Deep Specialist

Hephaestus is the developer who stays in their room coding all day. Doesn't talk much. Might seem socially awkward. But give them a hard technical problem and they'll emerge three hours later with a solution nobody else could have found.

This is why Hephaestus uses GPT-5.4. GPT-5.4 is built for exactly this:

Deep, autonomous exploration without hand-holding
Multi-file reasoning across complex codebases
Principle-driven execution (give a goal, not a recipe)
Working independently for extended periods

Using Hephaestus with GLM or Kimi would be like assigning your most communicative, sociable developer to sit alone and do nothing but deep technical work. They'd get it done eventually, but they wouldn't shine — you'd be wasting exactly the skills that make them valuable.

The Takeaway

Every agent's prompt is tuned to match its model's personality. When you change the model, you change the brain — and the same instructions get understood completely differently. Model matching isn't about "better" or "worse." It's about fit.

How Claude and GPT Think Differently

This matters for understanding why some agents support both model families while others don't.

Claude responds to mechanics-driven prompts — detailed checklists, templates, step-by-step procedures. More rules = more compliance. You can write a 1,100-line prompt with nested workflows and Claude will follow every step.

GPT (especially 5.2+) responds to principle-driven prompts — concise principles, XML structure, explicit decision criteria. More rules = more contradiction surface = more drift. GPT works best when you state the goal and let it figure out the mechanics.

Real example: Prometheus's Claude prompt is ~1,100 lines across 7 files. The GPT prompt achieves the same behavior with 3 principles in ~121 lines. Same outcome, completely different approach.

Agents that support both families (Prometheus, Atlas) auto-detect your model at runtime and switch prompts via isGptModel(). You don't have to think about it.

Agent Profiles

Communicators → Claude / Kimi / GLM

These agents have Claude-optimized prompts — long, detailed, mechanics-driven. They need models that reliably follow complex, multi-layered instructions.

Agent	Role	Fallback Chain	Notes
Sisyphus	Main orchestrator	Claude Opus → opencode-go/kimi-k2.5 → K2P5 → Kimi K2.5 → GPT-5.4 → GLM-5 → Big Pickle	Claude-family first. GPT-5.4 has dedicated prompt support. Kimi available through multiple providers.
Metis	Plan gap analyzer	Claude Opus → GPT-5.4 → opencode-go/glm-5 → K2P5	Claude preferred. GPT-5.4 as secondary before GLM-5 fallback.

Dual-Prompt Agents → Claude preferred, GPT supported

These agents ship separate prompts for Claude and GPT families. They auto-detect your model and switch at runtime.

Agent	Role	Fallback Chain	Notes
Prometheus	Strategic planner	Claude Opus → GPT-5.4 → opencode-go/glm-5 → Gemini 3.1 Pro	Interview-mode planning. GPT prompt is compact and principle-driven.
Atlas	Todo orchestrator	Claude Sonnet → opencode-go/kimi-k2.5 → GPT-5.4	Claude first, opencode-go as intermediate, GPT-5.4 as last resort.

Deep Specialists → GPT

These agents are built for GPT's principle-driven style. Their prompts assume autonomous, goal-oriented execution. Don't override to Claude.

Agent	Role	Fallback Chain	Notes
Hephaestus	Autonomous deep worker	GPT-5.4	Requires GPT access. The craftsman.
Oracle	Architecture consultant	GPT-5.4 → Gemini 3.1 Pro → Claude Opus → opencode-go/glm-5	Read-only high-IQ consultation.
Momus	Ruthless reviewer	GPT-5.4 → Claude Opus → Gemini 3.1 Pro → opencode-go/glm-5	Verification and plan review. GPT-5.4 uses xhigh variant.

Utility Runners → Speed over Intelligence

These agents do grep, search, and retrieval. They intentionally use the fastest, cheapest models available. Don't "upgrade" them to Opus — that's hiring a senior engineer to file paperwork.

Agent	Role	Fallback Chain	Notes
Explore	Fast codebase grep	Grok Code Fast → opencode-go/minimax-m2.7-highspeed → MiniMax M2.7 → Haiku → GPT-5-Nano	Speed is everything. Fire 10 in parallel.
Librarian	Docs/code search	opencode-go/minimax-m2.7 → MiniMax M2.7-highspeed → Haiku → GPT-5-Nano	Doc retrieval doesn't need deep reasoning.
Multimodal Looker	Vision/screenshots	GPT-5.4 → opencode-go/kimi-k2.5 → GLM-4.6v → GPT-5-Nano	Uses the first available multimodal-capable fallback.
Sisyphus-Junior	Category executor	Claude Sonnet → opencode-go/kimi-k2.5 → GPT-5.4 → MiniMax M2.7 → Big Pickle	Handles delegated category tasks. Sonnet-tier default.

Model Families

Claude Family

Communicative, instruction-following, structured output. Best for agents that need to follow complex multi-step prompts.

Model	Strengths
Claude Opus 4.6	Best overall. Highest compliance with complex prompts. Default for Sisyphus.
Claude Sonnet 4.6	Faster, cheaper. Good balance for everyday tasks.
Claude Haiku 4.5	Fast and cheap. Good for quick tasks and utility work.
Kimi K2.5	Behaves very similarly to Claude. Great all-rounder at lower cost.
GLM 5	Claude-like behavior. Solid for orchestration tasks.

GPT Family

Principle-driven, explicit reasoning, deep technical capability. Best for agents that work autonomously on complex problems.

Model	Strengths
GPT-5.3 Codex	Deep coding powerhouse. Autonomous exploration. Still available for deep category and explicit overrides.
GPT-5.4	High intelligence, strategic reasoning. Default for Oracle, Momus, and a key fallback for Prometheus / Atlas. Uses xhigh variant for Momus.
GPT-5.4 Mini	Fast + strong reasoning. Good for lightweight autonomous tasks. Default for quick category.
GPT-5-Nano	Ultra-cheap, fast. Good for simple utility tasks.

Other Models

Model	Strengths
Gemini 3.1 Pro	Excels at visual/frontend tasks. Different reasoning style. Default for `visual-engineering` and `artistry`.
Gemini 3 Flash	Fast. Good for doc search and light tasks.
Grok Code Fast 1	Blazing fast code grep. Default for Explore agent.
MiniMax M2.7	Fast and smart. Good for utility tasks and search/retrieval. Upgraded from M2.5 with better reasoning.
MiniMax M2.7 Highspeed	Ultra-fast variant. Optimized for latency-sensitive tasks like codebase grep.

OpenCode Go

A premium subscription tier ($10/month) that provides reliable access to Chinese frontier models through OpenCode's infrastructure.

Available Models:

Model	Use Case
opencode-go/kimi-k2.5	Vision-capable, Claude-like reasoning. Used by Sisyphus, Atlas, Sisyphus-Junior, Multimodal Looker.
opencode-go/glm-5	Text-only orchestration model. Used by Oracle, Prometheus, Metis, Momus.
opencode-go/minimax-m2.7	Ultra-cheap, fast responses. Used by Librarian, Explore, Atlas, Sisyphus-Junior for utility work.

When It Gets Used:

OpenCode Go models appear in fallback chains as intermediate options. They bridge the gap between premium Claude access and free-tier alternatives. The system tries OpenCode Go models before falling back to free tiers (MiniMax M2.7-highspeed, Big Pickle) or GPT alternatives.

Go-Only Scenarios:

Some model identifiers like k2p5 (paid Kimi K2.5) and glm-5 may only be available through OpenCode Go subscription in certain regions. When configured with these short identifiers, the system resolves them through the opencode-go provider first.

About Free-Tier Fallbacks

You may see model names like kimi-k2.5-free, minimax-m2.7-highspeed, or big-pickle (GLM 4.6) in the source code or logs. These are free-tier or speed-optimized versions of the same model families. They exist as lower-priority entries in fallback chains.

You don't need to configure them. The system includes them so it degrades gracefully when you don't have every paid subscription. If you have the paid version, the paid version is always preferred.

Task Categories

When agents delegate work, they don't pick a model name — they pick a category. The category maps to the right model automatically.

Category	When Used	Fallback Chain
`visual-engineering`	Frontend, UI, CSS, design	Gemini 3.1 Pro → GLM 5 → Claude Opus → opencode-go/glm-5 → K2P5
`ultrabrain`	Maximum reasoning needed	GPT-5.4 → Gemini 3.1 Pro → Claude Opus → opencode-go/glm-5
`deep`	Deep coding, complex logic	GPT-5.3 Codex → Claude Opus → Gemini 3.1 Pro
`artistry`	Creative, novel approaches	Gemini 3.1 Pro → Claude Opus → GPT-5.4
`quick`	Simple, fast tasks	GPT-5.4 Mini → Claude Haiku → Gemini Flash → opencode-go/minimax-m2.7 → GPT-5-Nano
`unspecified-high`	General complex work	Claude Opus → GPT-5.4 → GLM 5 → K2P5 → opencode-go/glm-5 → Kimi K2.5
`unspecified-low`	General standard work	Claude Sonnet → GPT-5.3 Codex → opencode-go/kimi-k2.5 → Gemini Flash
`writing`	Text, docs, prose	Gemini Flash → opencode-go/kimi-k2.5 → Claude Sonnet

See the Orchestration System Guide for how agents dispatch tasks to categories.

Customization

Example Configuration

{
  "$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/dev/assets/oh-my-openagent.schema.json",

  "agents": {
    // Main orchestrator: Claude Opus or Kimi K2.5 work best
    "sisyphus": {
      "model": "kimi-for-coding/k2p5",
      "ultrawork": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
    },

    // Research agents: cheaper models are fine
    "librarian": { "model": "google/gemini-3-flash" },
    "explore": { "model": "github-copilot/grok-code-fast-1" },

    // Architecture consultation: GPT or Claude Opus
    "oracle": { "model": "openai/gpt-5.4", "variant": "high" },

    // Prometheus inherits sisyphus model; just add prompt guidance
    "prometheus": {
      "prompt_append": "Leverage deep & quick agents heavily, always in parallel.",
    },
  },

  "categories": {
    "quick": { "model": "opencode/gpt-5-nano" },
    "unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
    "unspecified-high": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
    "visual-engineering": {
      "model": "google/gemini-3.1-pro",
      "variant": "high",
    },
    "writing": { "model": "google/gemini-3-flash" },
  },

  // Limit expensive providers; let cheap ones run freely
  "background_task": {
    "providerConcurrency": {
      "anthropic": 3,
      "openai": 3,
      "opencode": 10,
      "zai-coding-plan": 10,
    },
    "modelConcurrency": {
      "anthropic/claude-opus-4-6": 2,
      "opencode/gpt-5-nano": 20,
    },
  },
}

Run opencode models to see available models, opencode auth login to authenticate providers.

Safe vs Dangerous Overrides

Safe — same personality type:

Sisyphus: Opus → Sonnet, Kimi K2.5, GLM 5 (all communicative models)
Prometheus: Opus → GPT-5.4 (auto-switches to the GPT prompt)
Atlas: Claude Sonnet 4.6 → GPT-5.4 (auto-switches to the GPT prompt)

Dangerous — personality mismatch:

Sisyphus → older GPT models: Still a bad fit. GPT-5.4 is the only dedicated GPT prompt path.
Hephaestus → Claude: Built for Codex's autonomous style. Claude can't replicate this.
Explore → Opus: Massive cost waste. Explore needs speed, not intelligence.
Librarian → Opus: Same. Doc search doesn't need Opus-level reasoning.

How Model Resolution Works

Each agent has a fallback chain. The system tries models in priority order until it finds one available through your connected providers. You don't need to configure providers per model — just authenticate (opencode auth login) and the system figures out which models are available and where.

Agent Request → User Override (if configured) → Fallback Chain → System Default

17 KiB Raw Blame History