I tried out OpenRouter's MCP Server

I tried out OpenRouter's MCP Server

2026.06.27

This page has been translated by machine translation. View original

Introduction

Hello, I'm Morishige from Classmethod's Manufacturing Business Technology Division.

On June 25, 2026, OpenRouter released their MCP Server. OpenRouter is well known as a multi-provider gateway for LLMs, but what makes this MCP Server interesting in its design is that rather than simply wrapping the OpenAI-compatible API in the MCP protocol, it's positioned as a separate layer — a "development assistant for coding agents."

https://openrouter.ai/blog/announcements/openrouter-mcp-server/

Since I had just been writing about routing-related topics one after another, including NVIDIA LLM Router v3 and Sakana Fugu, I wanted to get hands-on with OpenRouter MCP Server while it was still hot off the press.

This is my experience from day two after release, so there were still things that didn't work and behaviors not documented officially. I'll walk through the behavior of all 13 tools, which models actually get selected via the Auto router, whether you can connect from Hermes Agent, and more — complete with real numbers.

What Is OpenRouter MCP Server?

OpenRouter MCP Server is a remotely hosted HTTP MCP server. There's no need to install anything locally — simply connect to the single endpoint https://mcp.openrouter.ai/mcp and you gain access to OpenRouter's live data and chat functionality as MCP tools.

The official intended use case is "pulling the latest OpenRouter information from a coding agent to select a suitable model and do quick test runs," and the official blog stated the following:

The MCP server is a development assistant for your coding agent... Your app should still call the OpenRouter API directly.

In other words, this is not designed to replace the routing layer of production applications — it's promoted strictly as a development-time assistant. Compared to LLM Router v3 or CCR, which handle "production routing paths," OpenRouter MCP Server occupies a different layer as a "supplementary layer on the development editor side," which makes its positioning easier to understand.

Item Value
GA Date 2026-06-25
Endpoint https://mcp.openrouter.ai/mcp
Connection Type Remote HTTP MCP
Authentication OAuth PKCE (bearer auth also worked)
Dedicated Key Spec Valid for 7 days, default $10 spend cap (editable on approval screen)
Supported Clients Claude Code / Claude Desktop / Cursor / Codex CLI / OpenCode

When you run the OAuth flow, a dedicated key labeled OpenRouter MCP: <client name> is issued, valid for 7 days. The spend cap defaults to $10, making it a reassuring design for use cases where you hand a key to a development agent.

A Look at the 13 Tools

The official blog introduced 11 tools, but reading through the documentation reveals there are actually 13. Only chat-send incurs charges; the remaining 12 are free, read-only tools.

Category Tool Name Description Billing
catalog models-list Live model catalog search (rich filtering and sorting) Free
catalog model-get Details for a specific model (pricing / context / supported parameters) Free
catalog model-endpoints Price / latency / throughput by provider Free
catalog providers-list List of providers Free
intelligence benchmarks Third-party scores from Artificial Analysis / Design Arena Free
intelligence rankings-daily Usage and trends by model Free
intelligence app-rankings Usage and trends by app Free
account credits-get Remaining credits Free
account generation-get Cost / token / provider details for a specific generation Free
docs / skill docs-search Full-text search of OpenRouter docs Free
docs / skill view-skill Best-practice skills from the OpenRouter knowledge base Free
utility chat-send Send a chat to any model
utility ping Health check Free

As of day two after release, view-skill currently returns empty content. Calling view-skill name="overview" as a test returns the following:

Unknown skill "overview". Available skills:

Since there's nothing after Available skills:, it appears no skills have been registered yet. It's safe to treat this tool as a placeholder at the time of writing. All 12 other tools each return different information, making for a solid "ask anything about OpenRouter" window for development agents.

Connecting from Claude Code via OAuth

The setup is remarkably simple — just add the following single block to .mcp.json or ~/.claude/mcp.json. The MCP client handles OAuth client information and redirect URIs behind the scenes.

{
  "mcpServers": {
    "openrouter": {
      "type": "http",
      "url": "https://mcp.openrouter.ai/mcp"
    }
  }
}

When you restart Claude Code, the OAuth flow runs on first connection. The browser opens a consent screen showing the label OpenRouter MCP: Claude Code, a $10 spend cap, and a 7-day expiry. After clicking Approve, the key is issued, and back in Claude Code, the /mcp command shows openrouter in a connected state with all 13 tool names listed.

Once connected, a good first test is hitting mcp__openrouter__ping, which returns pong — that completes the tool listing and connectivity check.

What Gets Selected via the Auto Router?

OpenRouter has an Auto router that automatically selects a model based on the difficulty and nature of the prompt. You can use it via MCP by specifying model: "openrouter/auto" in chat-send. I tried three different types of questions.

Pattern Specified Model Question
Light question openrouter/auto "Reply with just one word: hello"
Reasoning question openrouter/auto "Solve: x^2 + 5x + 6 = 0"
Reasoning question (cheapest route forced) openrouter/auto:floor Same as above

Every chat-send response automatically appends a footer like this:

hello

(model: openrouter/auto, generation id: gen-1782522452-f9zeD9t0SdmiT5Q5iD3U, input tokens: 221, output tokens: 5)

The model: openrouter/auto shown here is the routing meta (the specification you submitted), not which model was actually used. To see the actual selected model, you need to pass the generation id obtained here to generation-get.

Running all three generation ids through generation-get yielded the following results:

Question Actually Selected Model Provider total_cost latency native_tokens_reasoning
Light "hello" openai/gpt-5.5-20260423 OpenAI direct $0.001255 945ms 0
x²+5x+6=0 (auto) google/gemini-3.5-flash-20260519 Google direct $0.002759 1574ms 262
x²+5x+6=0 (floor) google/gemini-3.5-flash-20260519 Google direct $0.002570 1439ms 241

What surprised me was that GPT-5.5 was selected even for the light "hello" question, and that for the reasoning question, routing went through Gemini 3.5 Flash with its reasoning mode enabled. This deviated from the assumption of "big model for reasoning, cheap model for simple questions."

Another interesting finding was the effect of :floor (a suffix that forces the cheapest route). Even when I added :floor to the same reasoning question, the router still selected Gemini 3.5 Flash. When the Auto router is already returning something close to the optimal cheapest solution, adding :floor doesn't change the selection. However, the number of reasoning tokens dropped from 262 to 241 (a reduction of 21), and cost also fell from $0.002759 to $0.002570 (about 7% cheaper). You can see the router internally fine-tuning reasoning effort even when the same model is selected.

Note: The OpenRouter API has parameters like cost_quality_tradeoff to finely control Auto router behavior, but these are not exposed in MCP's chat-send. Controlling the Auto router via MCP is limited to combinations of model slug suffixes (:floor / :nitro / :free / :online) and provider.sort.

The JSON returned by generation-get includes fields like data.model / data.provider_name / data.provider_responses[].model_permaslug / data.native_tokens_reasoning, so while you can't see the router's decision criteria, you can trace the results in full. Being able to retroactively follow routing from within the development agent — without relying on external observability tools like Langfuse — is quite convenient.

benchmarks and model-endpoints Are Fascinating

benchmarks pulls third-party benchmarks from both Artificial Analysis and Design Arena. Omitting the source argument returns results from both mixed together, and you can narrow by task_type to coding / intelligence / agentic.

Pulling the coding top 5 produced the following table. The as_of timestamp is 2026-06-27T00:00:22.821Z.

Model Intelligence Coding Agentic Input $/1M Output $/1M
Claude Fable 5 59.9 76.5 52.8 $10 $50
GPT-5.5 (xhigh) 54.8 74.9 44.9 $5 $30
Claude Opus 4.8 55.7 74.3 47.2 $5 $25
Claude Opus 4.7 53.5 73.6 44.4 $5 $25
GPT-5.4 (xhigh) 51.4 71.1 41.1 $2.5 $15

Looking at them side by side, Fable 5 costs twice as much on both input and output compared to Claude Opus 4.8, while its coding score improvement is only +2.2pt. Even though I'd heard "Fable 5 is the best on benchmarks," seeing Opus 4.8 at 74.3pt for nearly half the cost suggests that for continuous-use cases like coding, Opus 4.8 often offers a better trade-off. Being able to pull this kind of decision-making material from within the editor is exactly what benchmarks is there for.

When you want to go a level deeper, model-endpoints is the tool to use. For example, querying the providers and pricing / latency / throughput for Claude Opus 4.8 yields the following table:

Provider Tag p50 Latency Uptime 30m Throughput p50 (tok/s)
Anthropic direct (1) anthropic 1,271 ms 99.91% 49
Anthropic direct (2) anthropic/2 1,883 ms 99.95% 54
Amazon Bedrock (us) amazon-bedrock/us 1,499 ms 99.97% 69
Amazon Bedrock (eu) amazon-bedrock/eu-west-1 5,047 ms 100% 56
Google Vertex (global) google-vertex/global 3,345 ms 100% 52
Google Vertex (europe) google-vertex/europe - - -

Anthropic direct is the fastest at p50 1.27 seconds, with Bedrock us close behind at 1.50 seconds. Meanwhile, Bedrock eu-west-1 maintains 100% uptime but comes in last at p50 5 seconds, and Vertex global sits at 3.3 seconds. For throughput, Bedrock us stands out at 69 tok/s. Being able to see live data on whether to compromise on latency by pinning to a region or prioritize speed with Anthropic direct is a powerful source of information.

Registering with Hermes Agent as an HTTP MCP

So far I've been accessing OpenRouter MCP from Claude Code, but I was also curious about connecting from Hermes Agent. Hermes Agent is my main working agent — I use it to run news distribution on a cron schedule and keep it running as a daemon.

Since the official OpenRouter MCP documentation only describes OAuth PKCE, accessing it from a daemon requires either implementing an OAuth client or finding an alternative authentication method.

I tried hitting https://mcp.openrouter.ai/mcp with a regular OpenRouter API key (OPENROUTER_API_KEY, for use in bearer headers).

curl -sS -X POST https://mcp.openrouter.ai/mcp \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

The result was HTTP 200 with result.tools[] returned. While bearer authentication is not explicitly mentioned in the official documentation, it appears that a regular OpenRouter API key with Authorization: Bearer can also connect to the MCP server.

This opens up a registration path from the daemon side. I added the following 6 lines to the mcp_servers: section of Hermes's ~/.hermes/config.yaml (alongside existing entries like aws-knowledge-mcp-server and deepwiki):

mcp_servers:
  # ... existing
  openrouter:
    url: https://mcp.openrouter.ai/mcp
    headers:
      Authorization: 'Bearer ${OPENROUTER_API_KEY}'
      Accept: application/json, text/event-stream
    timeout: 180
    connect_timeout: 60

${OPENROUTER_API_KEY} is automatically expanded from ~/.hermes/.env. Without restarting Hermes, running hermes mcp list shows a new entry:

  MCP Servers:

  Name                      Transport                      Tools  Status
  ────────────────────────  ────────────────────────────── ────── ──────────
  openrouter                https://mcp.openrouter.ai...   all    ✓ enabled

Registered with ✓ enabled. This means Hermes Agent sessions can now call tools like mcp_openrouter_credits-get and mcp_openrouter_chat-send. Not having to worry about the OAuth 7-day expiry and being able to use it from news cron jobs or delegation paths is a significant advantage. (Though there are security concerns to consider as well...)

Positioning Among the Three Routing Siblings

Let me summarize NVIDIA LLM Router v3, CCR (Claude Code Router), and OpenRouter Auto + MCP side by side.

Axis NVIDIA LLM Router v3 Claude Code Router (CCR) OpenRouter Auto + MCP
Connection Layer OpenAI-compatible HTTP Anthropic ↔ OpenAI translation proxy MCP protocol (HTTP)
Routing Decision Lightweight MLP (retrainable) Rule-based 5 patterns NotDiamond AI (curated, blackbox)
Explainability High (checkpoint + confidence column) Medium (traceable via rule config) Medium (results traceable via generation-get)
On-premise Model Mixing - (SaaS; BYOK is a separate topic)
IDE / Agent Native MCP - - (HTTP proxy)
Live Data Access - - ◎ (13 tools)
Intended Use Case Production routing layer Claude Code development-time routing Development-time assistant (production uses API directly)
Main Pitch Up to 99% cost reduction + custom retraining 5 patterns + 30-line JS patch Live data + 13 tools + MCP native

In a simple structural diagram, this can be read as a two-layer structure: a production routing layer and a development-time assistant layer.

OpenRouter MCP isn't meant to replace production routing — think of it as placing one "OpenRouter insider" inside the development editor. Being able to trace routing results end-to-end via generation-get can be read as OpenRouter MCP's "after-the-fact explanation" answer to LLM Router v3's checkpoint + confidence column.

What Apps Are Using OpenRouter?

Querying app-rankings for the top 10 in the coding category over the past 30 days (2026-05-28 to 2026-06-26) returned the following:

# App Tokens Requests
1 Hermes Agent 23.9T 335M
2 Kilo Code 6.5T 101M
3 OpenClaw 5.1T 99M
4 Claude Code 3.4T 40M
5 pi 1.8T 30M
6 Cline 1.1T 12M
7 Lemonade 1.0T 30M
8 GitLawb 615B 6M
9 Codex 364B 5M
10 OpenHands 256B 6M

Hermes Agent came in first, generating 3.7x the traffic of second-place Kilo Code and 7x the traffic of fourth-place Claude Code.

Pricing, Data Policy, and Points of Concern

OpenRouter itself is pay-as-you-go only with no subscription plans. A 5.5% platform fee (minimum $0.80) is added at the time of credit purchase. ZDR (zero data retention) is ON by default, and you opt in only when you want logging. EU region pinning requires an Enterprise contract — the same as the regular OpenRouter API.

Even via the MCP Server, only chat-send incurs charges; the remaining 12 read-only tools are free. Use cases like "hit benchmarks 100 times" can be done at zero additional cost within rate limit bounds.

Potential Stumbling Points

Since this feature was just released, I ran into various "huh?" behaviors while exploring. Here's a summary of what I noticed as of the time of writing (2026-06-27).

chat-send has limited control parameters. The OpenRouter API itself has Auto router control parameters like cost_quality_tradeoff / session_id, but these are not exposed in MCP's chat-send. When you want to change routing via MCP, you indirectly control it through combinations of model slug suffixes (:floor / :nitro / :free / :online) and provider.sort / provider.only / provider.order.

I also hit a 404 from generation-get for openrouter/fusion. I tried both Auto and Fusion, but when I passed the generation id from an openrouter/fusion generation to generation-get, it returned Generation gen-... not found. Since openrouter/auto generation IDs pull up fine, it looks like Fusion router data hasn't been indexed yet — a transitional state.

view-skill content is also still empty. The state where Available skills: is followed by nothing is a placeholder, and it's not usable in this early release period. Once skills are populated, new use cases should emerge.

Looking at the tools/list response, each tool's annotation includes an unfamiliar field: execution.taskSupport: "forbidden". This appears to be an extension telling clients that the MCP server "cannot be delegated as a task." Most MCP clients probably don't reference this yet, but if you're thinking about agentic use cases, it's a field worth keeping in mind.

The 7-day expiry of the OAuth-issued dedicated key is also something to consider when putting this into operations. How MCP clients handle re-authorization depends on their implementation, so if you're putting this into production, you'll want to understand that behavior early. In my case, the Hermes side escapes this by using bearer authentication, so OAuth expiry isn't an issue in that configuration.

chat-send streaming support is not explicitly documented. In my tests, streamed: true appeared in generation-get responses, suggesting streaming is happening internally, but whether MCP clients can display output incrementally is a separate question — that also remains unverified.

Summary

I got hands-on with OpenRouter MCP Server just two days after its release. The 13 tools are centered on live data, and it's interesting that the following workflow can be completed entirely within the editor: use benchmarks to back up model selection decisions, use model-endpoints to view latency by provider, use chat-send for quick test runs, and use generation-get to retroactively check what was selected.

I also have a feeling that the numbers you can pull from benchmarks and generation-get could serve as material for drift detection in long-running router setups, so I plan to verify that separately. There are still rough edges as a just-released product, but there seem to be plenty of use cases, and I'm looking forward to seeing how it evolves.


国内企業 AI活用実態調査2026 配布中

クラスメソッドが独自に行なったAI診断調査をもとに、企業のAI活用の現在地を調査レポートとしてまとめました。企業規模別の活用度傾向に加え、規模を超えてAI活用を進める企業に共通する取り組みまで、自社の現在地を捉えるためのヒントにぜひ。

国内企業 AI活用実態調査2026

無料でダウンロードする

Share this article