I tried out OpenRouter's MCP Server

I actually tried the MCP Server released from OpenRouter in June 2026, on the second day after its release. I will introduce the system, which serves as a development assistant layer equipped with 13 tools that allows you to back up model selection decisions from live data and track routing results, along with actual figures from hands-on testing.

森茂洋 / Hiroshi Morishige

2026.06.27

This page has been translated by machine translation. View original

 IntroductionHello, I'm Morishige from Classmethod's Manufacturing Business Technology Department.
On June 25, 2026, OpenRouter released its MCP Server. OpenRouter is well known as a multi-provider gateway for LLMs, but what makes this MCP Server's design interesting is that rather than simply wrapping the OpenAI-compatible API in the MCP protocol, it's positioned as a distinct layer: a "development assistant for coding agents."
https://openrouter.ai/blog/announcements/openrouter-mcp-server/
Since I had been writing about routing-related topics in succession — NVIDIA LLM Router v3 and Sakana Fugu — I wanted to try out the OpenRouter MCP Server while it was still hot off the press, so I gave it a go.
Since this is my experience on day 2 after release, I encountered areas that weren't working yet and behaviors not documented officially. I'll introduce the behavior of all 13 tools, which models actually get selected via the Auto router, whether you can connect from Hermes Agent, and more — all with real numbers from hands-on testing.
 What Is the OpenRouter MCP Server?The OpenRouter MCP Server is a remotely hosted HTTP MCP server. There's no need to install anything locally — simply connecting to the single endpoint https://mcp.openrouter.ai/mcp makes OpenRouter's live data and chat functionality available as MCP tools.
The intended use case described by the official team is "pulling the latest OpenRouter information from a coding agent to select a model that fits the purpose and do some quick test calls," and the official blog states:
The MCP server is a development assistant for your coding agent... Your app should still call the OpenRouter API directly.
In other words, the design is not meant to replace the routing layer of production applications with the MCP Server — it's promoted purely as a development-time assistant. While LLM Router v3 and CCR handle "production routing paths," reading OpenRouter MCP Server as a "supplementary layer on the development editor side" helps clarify its positioning.


Item
Value


GA Date
2026-06-25

Endpoint
https://mcp.openrouter.ai/mcp

Connection Type
Remote HTTP MCP

Authentication
OAuth PKCE (bearer auth also worked)

Dedicated Key Spec
Valid for 7 days, default $10 spend cap (editable on approval screen)

Supported Clients
Claude Code / Claude Desktop / Cursor / Codex CLI / OpenCode

When you run the OAuth flow, a dedicated key labeled OpenRouter MCP: <client name> is issued with a 7-day validity. Since the spend cap defaults to $10, it's a reassuring design for use cases where you hand a key to a development agent.
 A Look at the 13 ToolsThe official blog introduced 11 tools, but reading through the documentation reveals there are actually 13. Only chat-send incurs charges; the remaining 12 are read-only free tools.


Category
Tool Name
Description
Billing


catalog
models-list
Live model catalog search (rich filtering and sorting)
Free

catalog
model-get
Details for a specific model (pricing / context / supported parameters)
Free

catalog
model-endpoints
Price / latency / throughput by provider
Free

catalog
providers-list
List of providers
Free

intelligence
benchmarks
Third-party scores from Artificial Analysis / Design Arena
Free

intelligence
rankings-daily
Usage and trends by model
Free

intelligence
app-rankings
Usage and trends by app
Free

account
credits-get
Remaining credits
Free

account
generation-get
Cost / token / provider details for a specific generation
Free

docs / skill
docs-search
Full-text search of OpenRouter docs
Free

docs / skill
view-skill
Best-practice skills from the OpenRouter knowledge base
Free

utility
chat-send
Send a chat to any model
✅

utility
ping
Health check
Free

As of day 2 after release, view-skill currently returns empty content. Trying view-skill name="overview" returns the following:
Unknown skill "overview". Available skills:
Since nothing follows Available skills:, it appears no skills have been registered yet. It's safe to think of this as a placeholder tool at the time of this article's publication.
The other 12 tools each return different information, making them a well-rounded "one-stop window for anything about OpenRouter" for development agents.
 Connecting from Claude Code via OAuthThe setup is very simple — just add the following single block to .mcp.json or ~/.claude/mcp.json. The OAuth client information and redirect URI are handled behind the scenes by the MCP client.
{
  "mcpServers": {
    "openrouter": {
      "type": "http",
      "url": "https://mcp.openrouter.ai/mcp"
    }
  }
}
When you restart Claude Code, the OAuth flow runs on first connection. The browser navigates to a consent screen showing a key labeled OpenRouter MCP: Claude Code with a $10 spend cap and 7-day expiry. After approving, the key is issued, and back in Claude Code, the /mcp command shows openrouter in a connected state with all 13 tool names listed.
Once connected, hitting mcp__openrouter__ping as a first greeting returns pong, completing the tool listing and connectivity check.
 What Gets Selected via the Auto Router?OpenRouter has an Auto router that automatically selects a model based on the difficulty and nature of the prompt. You can use it via MCP by specifying model: "openrouter/auto" in chat-send. I tried sending three types of questions.


Pattern
Specified Model
Question


Light question
openrouter/auto
"Reply with just one word: hello"

Reasoning question
openrouter/auto
"Solve: x^2 + 5x + 6 = 0"

Reasoning question (force cheapest route)
openrouter/auto:floor
Same as above

The response from chat-send always has a footer like this appended automatically:
hello

(model: openrouter/auto, generation id: gen-1782522452-f9zeD9t0SdmiT5Q5iD3U, input tokens: 221, output tokens: 5)
The model: openrouter/auto shown here is the routing meta (the specification I sent), not which model was actually used. To see the actual selected model, you need to pass the generation id obtained here to generation-get.
After running the three generation ids through generation-get, here are the results:


Question
Actual Model Selected
Provider
total_cost
latency
native_tokens_reasoning


Light "hello"
openai/gpt-5.5-20260423
OpenAI direct
$0.001255
945ms
0

x²+5x+6=0 (auto)
google/gemini-3.5-flash-20260519
Google direct
$0.002759
1574ms
262

x²+5x+6=0 (floor)
google/gemini-3.5-flash-20260519
Google direct
$0.002570
1439ms
241

What surprised me was that GPT-5.5 was selected even for the light "hello" question, and that for the reasoning question, the router was routing through Gemini 3.5 Flash with reasoning mode enabled. This ran counter to assumptions like "bigger models for reasoning" and "cheaper models for light questions."
Another interesting finding was the effect of :floor (a suffix that forces the cheapest route). Even when I sent the same reasoning question with :floor, the router selected the same Gemini 3.5 Flash. It appears that when the Auto router is already returning "close to the cheapest reasonable answer," adding :floor doesn't change the selection. However, the number of reasoning tokens decreased from 262 to 241 (21 fewer), and the cost dropped from $0.002759 to $0.002570 (about 7% cheaper). You can observe that even with the same model, the router internally adjusts reasoning effort.
Note: The OpenRouter API has parameters like cost_quality_tradeoff for fine-grained control of Auto router behavior, but these are not exposed in the MCP's chat-send. Auto router control via MCP is limited to combinations of model slug suffixes (:floor / :nitro / :free / :online) and provider.sort.
The JSON returned by generation-get includes fields like data.model / data.provider_name / data.provider_responses[].model_permaslug / data.native_tokens_reasoning, so while the reasoning behind the router's decision isn't visible, the results can be tracked end-to-end. Being able to trace routing retrospectively from within the development agent's view — without relying on external observability tools like Langfuse — is quite convenient.
 benchmarks and model-endpoints Are Fascinatingbenchmarks pulls third-party benchmark scores from two sources: Artificial Analysis and Design Arena. Omitting the source argument returns results from both mixed together, and you can narrow down by task_type to coding / intelligence / agentic.
Pulling the top 5 for coding returned the following table. The as_of timestamp is 2026-06-27T00:00:22.821Z.


Model
Intelligence
Coding
Agentic
Input $/1M
Output $/1M


Claude Fable 5
59.9
76.5
52.8
$10
$50

GPT-5.5 (xhigh)
54.8
74.9
44.9
$5
$30

Claude Opus 4.8
55.7
74.3
47.2
$5
$25

Claude Opus 4.7
53.5
73.6
44.4
$5
$25

GPT-5.4 (xhigh)
51.4
71.1
41.1
$2.5
$15

Comparing them, Fable 5 costs twice as much as Claude Opus 4.8 on both input and output, yet the coding score improvement is only +2.2pt. While I'd heard "Fable 5 is the best" benchmark-wise, seeing that Opus 4.8 scores 74.3pt at nearly half the cost suggests that for continuous-use scenarios like coding, Opus 4.8 often offers a better tradeoff. Being able to pull this kind of decision-making material right from the editor is exactly what benchmarks is for.
For deeper analysis, model-endpoints is the next step. For example, pulling the providers and their pricing / latency / throughput for Claude Opus 4.8 gives the following table:


Provider
Tag
p50 Latency
Uptime 30m
Throughput p50 (tok/s)


Anthropic direct (1)
anthropic
1,271 ms
99.91%
49

Anthropic direct (2)
anthropic/2
1,883 ms
99.95%
54

Amazon Bedrock (us)
amazon-bedrock/us
1,499 ms
99.97%
69

Amazon Bedrock (eu)
amazon-bedrock/eu-west-1
5,047 ms
100%
56

Google Vertex (global)
google-vertex/global
3,345 ms
100%
52

Google Vertex (europe)
google-vertex/europe
-
-
-

Anthropic direct is the fastest at p50 1.27 seconds, with Bedrock us close behind at 1.50 seconds. Meanwhile, Bedrock eu-west-1 maintains 100% uptime but is the slowest at p50 5 seconds, and Vertex global is at 3.3 seconds. For throughput, Bedrock us leads with 69 tok/s. Having live data to weigh "accept higher latency by pinning to a region vs. prioritize speed with Anthropic direct" is a powerful information source.
 Registering with Hermes Agent as an HTTP MCPSo far I've been interacting with OpenRouter MCP from Claude Code, but another thing I was curious about was connecting from Hermes Agent. Hermes Agent is my main working agent — I use it for cron-based news delivery and running it as a daemon.
Since the official OpenRouter MCP documentation only describes OAuth PKCE, using it from a daemon requires either implementing an OAuth client or finding an alternative authentication method.
I tried hitting https://mcp.openrouter.ai/mcp with a regular OpenRouter API key (OPENROUTER_API_KEY, for bearer header use):
curl -sS -X POST https://mcp.openrouter.ai/mcp \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
The result was HTTP 200 with result.tools[] returned. While bearer authentication isn't explicitly mentioned in the official documentation, it appears you can connect to the MCP server using a regular OpenRouter API key with Authorization: Bearer.
This reveals a registration path from the daemon side. I added the following 6 lines to the mcp_servers: section of Hermes's ~/.hermes/config.yaml (alongside existing entries like aws-knowledge-mcp-server and deepwiki):
mcp_servers:
  # ... existing
  openrouter:
    url: https://mcp.openrouter.ai/mcp
    headers:
      Authorization: 'Bearer ${OPENROUTER_API_KEY}'
      Accept: application/json, text/event-stream
    timeout: 180
    connect_timeout: 60
${OPENROUTER_API_KEY} is automatically expanded from ~/.hermes/.env. Running hermes mcp list without restarting Hermes shows a new entry:
  MCP Servers:

  Name                      Transport                      Tools  Status
  ────────────────────────  ────────────────────────────── ────── ──────────
  openrouter                https://mcp.openrouter.ai...   all    ✓ enabled
It registered with ✓ enabled. This means Hermes Agent sessions can now call tools like mcp_openrouter_credits-get and mcp_openrouter_chat-send. Being able to use it from news cron jobs and delegation paths without worrying about the OAuth 7-day expiry is a major advantage. (Though it does come with security considerations...)
 Positioning Among the Routing TrioLet me organize NVIDIA LLM Router v3, CCR (Claude Code Router), and OpenRouter Auto + MCP side by side.


Axis
NVIDIA LLM Router v3
Claude Code Router (CCR)
OpenRouter Auto + MCP


Connection Layer
OpenAI-compatible HTTP
Anthropic ↔ OpenAI translation proxy
MCP protocol (HTTP)

Routing Decision
Lightweight MLP (retrainable)
Rule-based 5 categories
NotDiamond AI (curated, blackbox)

Explainability
High (checkpoint + confidence column)
Medium (traceable via rule settings)
Medium (results traceable via generation-get)

On-prem Model Mixing
◯
◯
- (SaaS; BYOK is a separate topic)

IDE / Agent Native MCP
-
- (HTTP proxy)
◎

Live Data Reference
-
-
◎ (13 tools)

Intended Use Case
Production routing layer
Claude Code development-time routing
Development-time assistant (production uses API directly)

Key Value Proposition
Up to 99% cost reduction + custom retraining
5 categories + 30-line JS patch
Live data + 13 tools + MCP native

In a simple structural diagram, this can be read as a two-layer structure: a production routing layer and a development-time assistant layer.
OpenRouter MCP is not a replacement for production routing — it's more like having "an OpenRouter insider" living in your development editor. Being able to trace routing results end-to-end with generation-get can be read as OpenRouter MCP's "post-hoc explanation" answer to LLM Router v3's checkpoint + confidence column.
 What Apps Are Using OpenRouter?Pulling the top 10 coding category apps for the past 30 days (2026-05-28 to 2026-06-26) via app-rankings yielded the following:


#
App
Tokens
Requests


1
Hermes Agent
23.9T
335M

2
Kilo Code
6.5T
101M

3
OpenClaw
5.1T
99M

4
Claude Code
3.4T
40M

5
pi
1.8T
30M

6
Cline
1.1T
12M

7
Lemonade
1.0T
30M

8
GitLawb
615B
6M

9
Codex
364B
5M

10
OpenHands
256B
6M

Hermes Agent ranked #1, generating 3.7 times the traffic of #2 Kilo Code and 7 times that of #4 Claude Code.
 Pricing, Data Policy, and Points of ConcernOpenRouter itself is pay-as-you-go only with no subscription plans. A 5.5% platform fee (minimum $0.80) is added at credit purchase time. ZDR (zero data retention) is ON by default, with an opt-in if you want logging. EU region pinning requires an Enterprise contract — same as the regular OpenRouter API.
Even via the MCP Server, only chat-send incurs charges; the remaining 12 read-only tools are free. Use cases like "calling benchmarks 100 times" can be done at zero additional cost within rate limits.
 Potential Pain PointsSince this feature just launched, I encountered a number of "hmm, that's odd" behaviors while trying things out. Here's a summary of what I noticed as of the time of publication (2026-06-27).
There are limits to chat-send's control parameters. The OpenRouter API itself has Auto router control parameters like cost_quality_tradeoff / session_id, but these are not exposed in the MCP's chat-send. When you want to change routing via MCP, you need to control it indirectly using model slug suffixes (:floor / :nitro / :free / :online) or combinations of provider.sort / provider.only / provider.order.
I also encountered a situation where generation-get returned 404 for openrouter/fusion. I tried both Auto and Fusion, but passing the generation id from an openrouter/fusion generation to generation-get returned Generation gen-... not found. Since Auto router generation IDs can be retrieved without issue, it looks like Fusion router data isn't indexed yet — a transitional state.
The content of view-skill is also still empty. With Available skills: remaining a placeholder, it's not usable in this early release period. Once skills are populated, new use cases should emerge.
Looking at the tools/list response, each tool's annotation has an unfamiliar field: execution.taskSupport: "forbidden". This appears to be an extension where the MCP server tells clients that it "cannot be delegated as a task." Most MCP clients probably don't reference this yet, but it's a field worth keeping in mind if you're thinking about agentic use cases.
The 7-day expiry on dedicated keys issued via OAuth is also something to keep in mind when putting this into production use. How the MCP client handles re-authorization depends on the implementation, so if you're deploying this operationally, it's worth understanding that behavior early. Since the Hermes side uses bearer authentication, OAuth expiry isn't an issue in that setup.
Streaming support for chat-send is not explicitly mentioned in the official documentation. In my testing, streamed: true appeared in generation-get outputs, suggesting streaming is happening internally, but whether the MCP client side can display responses incrementally is a separate question — and also remains unverified.
 SummaryI tried out the OpenRouter MCP Server on day 2 after its release. The 13 tools are centered on live data — it's interesting that the full workflow of using benchmarks to back up model selection, model-endpoints to check provider-by-provider latency, chat-send for quick test calls, and generation-get to trace back what was selected can all be completed within the editor.
I also have a hunch that the numbers obtainable from benchmarks and generation-get could serve as useful material for drift detection in long-running router operations, so I'd like to revisit that with proper verification. There are still rough edges given the recency of the release, but there seem to be plenty of use cases, and I'm looking forward to seeing how it evolves.
 Reference LinksIntroducing the OpenRouter MCP Server (release article)
OpenRouter MCP Server Docs
OpenRouter Auto Router
OpenRouter Pricing
OpenRouter Zero Data Retention
hermes-ciel (GitHub)
Building a Use-Case-Specific LLM Environment with NVIDIA LLM Router (Basics)
Retraining NVIDIA LLM Router to Match My Own Persona (Training Edition)

I tried out OpenRouter's MCP Server

Introduction

What Is the OpenRouter MCP Server?

A Look at the 13 Tools

Connecting from Claude Code via OAuth

What Gets Selected via the Auto Router?

benchmarks and model-endpoints Are Fascinating

Registering with Hermes Agent as an HTTP MCP

Positioning Among the Routing Trio

What Apps Are Using OpenRouter?

Pricing, Data Policy, and Points of Concern

Potential Pain Points

Summary

Reference Links

AI白書2026 配布中

AWS Topics

Trending Topics

Products & Services

Features and Series

Item	Value
GA Date	2026-06-25
Endpoint	`https://mcp.openrouter.ai/mcp`
Connection Type	Remote HTTP MCP
Authentication	OAuth PKCE (bearer auth also worked)
Dedicated Key Spec	Valid for 7 days, default $10 spend cap (editable on approval screen)
Supported Clients	Claude Code / Claude Desktop / Cursor / Codex CLI / OpenCode

Category	Tool Name	Description	Billing
catalog	`models-list`	Live model catalog search (rich filtering and sorting)	Free
catalog	`model-get`	Details for a specific model (pricing / context / supported parameters)	Free
catalog	`model-endpoints`	Price / latency / throughput by provider	Free
catalog	`providers-list`	List of providers	Free
intelligence	`benchmarks`	Third-party scores from Artificial Analysis / Design Arena	Free
intelligence	`rankings-daily`	Usage and trends by model	Free
intelligence	`app-rankings`	Usage and trends by app	Free
account	`credits-get`	Remaining credits	Free
account	`generation-get`	Cost / token / provider details for a specific generation	Free
docs / skill	`docs-search`	Full-text search of OpenRouter docs	Free
docs / skill	`view-skill`	Best-practice skills from the OpenRouter knowledge base	Free
utility	`chat-send`	Send a chat to any model	✅
utility	`ping`	Health check	Free

Pattern	Specified Model	Question
Light question	`openrouter/auto`	"Reply with just one word: hello"
Reasoning question	`openrouter/auto`	"Solve: x^2 + 5x + 6 = 0"
Reasoning question (force cheapest route)	`openrouter/auto:floor`	Same as above

Question	Actual Model Selected	Provider	total_cost	latency	native_tokens_reasoning
Light "hello"	`openai/gpt-5.5-20260423`	OpenAI direct	$0.001255	945ms	0
x²+5x+6=0 (auto)	`google/gemini-3.5-flash-20260519`	Google direct	$0.002759	1574ms	262
x²+5x+6=0 (floor)	`google/gemini-3.5-flash-20260519`	Google direct	$0.002570	1439ms	241

Model	Intelligence	Coding	Agentic	Input $/1M	Output $/1M
Claude Fable 5	59.9	76.5	52.8	$10	$50
GPT-5.5 (xhigh)	54.8	74.9	44.9	$5	$30
Claude Opus 4.8	55.7	74.3	47.2	$5	$25
Claude Opus 4.7	53.5	73.6	44.4	$5	$25
GPT-5.4 (xhigh)	51.4	71.1	41.1	$2.5	$15

Provider	Tag	p50 Latency	Uptime 30m	Throughput p50 (tok/s)
Anthropic direct (1)	anthropic	1,271 ms	99.91%	49
Anthropic direct (2)	anthropic/2	1,883 ms	99.95%	54
Amazon Bedrock (us)	amazon-bedrock/us	1,499 ms	99.97%	69
Amazon Bedrock (eu)	amazon-bedrock/eu-west-1	5,047 ms	100%	56
Google Vertex (global)	google-vertex/global	3,345 ms	100%	52
Google Vertex (europe)	google-vertex/europe	-	-	-

Axis	NVIDIA LLM Router v3	Claude Code Router (CCR)	OpenRouter Auto + MCP
Connection Layer	OpenAI-compatible HTTP	Anthropic ↔ OpenAI translation proxy	MCP protocol (HTTP)
Routing Decision	Lightweight MLP (retrainable)	Rule-based 5 categories	NotDiamond AI (curated, blackbox)
Explainability	High (checkpoint + confidence column)	Medium (traceable via rule settings)	Medium (results traceable via `generation-get`)
On-prem Model Mixing	◯	◯	- (SaaS; BYOK is a separate topic)
IDE / Agent Native MCP	-	- (HTTP proxy)	◎
Live Data Reference	-	-	◎ (13 tools)
Intended Use Case	Production routing layer	Claude Code development-time routing	Development-time assistant (production uses API directly)
Key Value Proposition	Up to 99% cost reduction + custom retraining	5 categories + 30-line JS patch	Live data + 13 tools + MCP native

#	App	Tokens	Requests
1	Hermes Agent	23.9T	335M
2	Kilo Code	6.5T	101M
3	OpenClaw	5.1T	99M
4	Claude Code	3.4T	40M
5	pi	1.8T	30M
6	Cline	1.1T	12M
7	Lemonade	1.0T	30M
8	GitLawb	615B	6M
9	Codex	364B	5M
10	OpenHands	256B	6M