
I tried reading the self-improving of Hermes Agent from the code
This page has been translated by machine translation. View original
Hello, I'm Morishige from the Classmethod Manufacturing Business Technology Department.
Many of you may be curious about the term "self-improving agent" that Hermes Agent advocates.
However, phrases like "will the model retrain itself automatically?" or "will the behavior change without my knowledge?" can make you feel a bit guarded. I was the same way at first.
In this article, I'll use the NemoHermes configuration as a subject to organize, through code and settings, what actually grows in Hermes Agent. To get straight to the point: what mainly grows is not model weights, but skills for reusing business procedures and output formats.
Note that this property of "skills growing through use" is called self-improving by the original Nous Research, and self-evolving in the context of NVIDIA's NemoClaw / OpenShell. They refer to the same thing, so in this article I'll primarily use self-improving to align with the original, and only write self-evolving when covering NVIDIA's presentation.
Just as I was writing this, Hermes Agent 0.16.0 (v2026.6.5) was released. It included updates around skills covering "how to fold growing skills," "how much to trust growing skills," and "how far to let things run autonomously," so I took the opportunity to also read through the 0.16.0 code.
What This Article Covers
In this article, I'll read the Hermes Agent, NemoClaw, and OpenShell repositories published around NVIDIA / Nous Research. The overall picture of NemoClaw is also explained in the NVIDIA NemoClaw Architecture Reference.
I confirmed the following commits locally. I read the basic structure from 0.15.1, and confirmed parts added in 0.16.0 (released on 2026-06-05, v2026.6.5) using the 0.16.0 code.
| repo | Confirmed reference | Role covered in this article |
|---|---|---|
| hermes-agent | 40420a6 (0.15.1) |
Basic structure of skill, memory, and session search |
| hermes-agent | v2026.6.5 / d6b9cfa (0.16.0) |
Skill organization, trust, and autonomy layers added in 0.16.0 |
| NemoClaw | 17734b1 |
Blueprint and CLI layer for running Hermes on OpenShell |
| OpenShell | b41e0df |
Runtime layer responsible for sandbox and policy |
What's interesting is that the description of the core tool for creating, fixing, and deleting skills (tools/skill_manager_tool.py) itself hasn't changed since 0.15.1. What was touched in 0.16.0 was the operational side surrounding that core. You could read it as: the mechanism for growing was left in place, while the mechanisms for governing it increased.
How NVIDIA's Official Video Presents self-evolving
This term self-evolving is also explained quite concretely in NVIDIA Developer videos.
The short demo "How We Built Self-Evolving Hermes Agents With NVIDIA NemoClaw" introduces a configuration where Hermes runs inside an NVIDIA OpenShell sandbox and connects with Slack, Outlook, and GitHub. The centerpiece of the demo is the flow where, when you teach Hermes an output format for a GitHub issue digest through conversation, Hermes saves that format as a reusable skill and can return results in the same format when requested from a different channel.
The longer Nemotron Labs stream "Self-Evolving Hermes Agents: Enterprise AI That Gets Better With Use" covered topics including skill creation, session search, memory, policy-gated sandboxing via OpenShell, token masking, countermeasures for skill bloat, and enterprise deployment considerations.
Viewing NemoHermes in Three Layers
First, it becomes easier to read NemoHermes when you divide it into three layers: Model, Harness, and Runtime.
Nemotron and vLLM are the Model layer, responsible for inference and generation. Hermes Agent appears as the Harness layer, calling tools and using skills, memory, and session search. OpenShell is positioned as the Runtime layer, controlling which files and networks the agent is allowed to access.
The nemohermes command on the NemoClaw side is also easier to understand when viewed through these three layers. Looking at the implementation, nemohermes is defined as a thin entry point that has NemoClaw select the Hermes agent.
Source: NemoClaw/bin/nemohermes.js L6-L9
// NemoHermes — alias for NemoClaw with the Hermes agent pre-selected.
process.env.NEMOCLAW_AGENT = 'hermes';
process.env.NEMOCLAW_INVOKED_AS = 'nemohermes';
module.exports = require('../dist/nemoclaw');
In other words, it's more natural to read NemoHermes not as a separate large agent implementation, but as a configuration that selects Hermes Agent on top of NemoClaw. On the NemoClaw documentation side as well, nemohermes is described as an alias for nemoclaw with the Hermes agent pre-selected.
The Core of What Grows in Hermes is the Skill
Now we get to the main point.
What appears to be at the center of self-improving on the Hermes Agent side is the skill. The opening comment of skill_manager_tool.py states the role of skills quite clearly.
Source: hermes-agent/tools/skill_manager_tool.py L5-L12
Allows the agent to create, update, and delete skills, turning successful
approaches into reusable procedural knowledge.
In the same file, skills are called the agent's procedural memory, distinguished from general memory.
Skills are the agent's procedural memory: they capture *how to do a specific
type of task* based on proven experience. General memory (MEMORY.md, USER.md) is
broad and declarative. Skills are narrow and actionable.
Reading this comment makes what grows through self-improving a bit more concrete. Rather than model weights being updated on the spot, it's more a reading of successful work procedures remaining in a reusable form.
Hermes user skills are treated as a structure with a SKILL.md and related files under ~/.hermes/skills/.
Source: hermes-agent/tools/skill_manager_tool.py L22-L32
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md
│ ├── references/
│ ├── templates/
│ ├── scripts/
│ └── assets/
This is different from mere chat history. Since it remains as SKILL.md and supporting files, humans can read it, modify it, and take it to different environments.
For example, if the morning issue digest has a fixed perspective and output format, rather than explaining it with a long prompt every time, you can leave it as a skill. Roughly speaking, the image is something like this:
---
name: daily-issue-digest
description: Produce a daily issue digest in the agreed format.
metadata:
hermes:
tags: [digest, github]
---
Of course, actual skills also include procedures, caveats, verification methods, and if necessary, templates and helper scripts. I think it's easiest to understand as a mechanism that shifts prompt craftsmanship from one-time conversations to small software assets.
In 0.16.0, Skills Shift from "Growing More" to "Selecting and Folding"
Reading the previous chapter, you can grow skills indefinitely. While convenient, skills that keep growing create a different problem: "it becomes unclear which one to use and when." Version 0.16.0 addressed this.
First, a mechanism was introduced to show or hide skills based on the execution environment. By writing environments: in a skill's frontmatter, the skill only appears in the skill list for that environment.
Source: hermes-agent/agent/skill_utils.py L234-L253
"""Return True when the skill is relevant to the current runtime environment.
Skills may declare an ``environments`` list in their YAML frontmatter::
environments: [kanban] # only relevant when kanban is active
environments: [s6] # only relevant inside the s6 Docker image
environments: [docker] # only relevant inside any container
If the field is absent or empty the skill is relevant in **all**
environments (backward-compatible default).
This is an OFFER-time filter: it controls whether a skill shows up in the
skills index / autocomplete / slash-command list.
"""
What I found thoughtful here is that this is only a filter for "whether to show it." The comment also states it doesn't apply to skill_view or explicit loading via --skills. Since explicitly loading is an explicit act of consent, force-load passes through even if hidden by the filter. It's a design that reduces skills irrelevant to the environment from the list while ensuring they can always be called when needed.
Next, a curator was introduced to fold away unused skills. It's a mechanism that archives skills that haven't been used for a certain period, and the key point is that built-in (bundled) skills can also be targeted.
Source: hermes-agent/agent/curator.py L187-L195
"""Whether the curator may prune (archive) bundled built-in skills too.
ON by default. When on, built-ins become curation candidates and are
archived after the same inactivity period as agent-created skills, with a
suppression list keeping them archived across `hermes update` re-seeds.
Hub-installed skills are never pruned regardless of this flag.
"""
Making bundled skills also subject to pruning by default looks like a bold design choice. It's written that even if they're re-placed by hermes update, a suppression list keeps them in the archived state, so there's also consideration to prevent something once archived from spontaneously coming back. On the other hand, hub-installed skills you added yourself are excluded from pruning, so there's peace of mind that what you installed won't disappear.
Also, the direction has shifted toward making the default skill set lighter and adding what you want afterward. Optional skills are fetched with hermes skills install.
Source: hermes-agent/hermes_cli/skills_hub.py L459-L470
def do_install(identifier: str, category: str = "", force: bool = False,
console: Optional[Console] = None, skip_confirm: bool = False,
invalidate_cache: bool = True,
name_override: str = "") -> None:
"""Fetch, quarantine, scan, confirm, and install a skill.
...
"""
You can see that the install flow follows the order of fetch, quarantine, scan, and confirm. Rather than immediately placing a fetched skill, steps are taken to isolate it, scan it, and then install it. Since skills are premised on "growing and being received," it gives the impression that proper etiquette for the entry point is well prepared.
How Much to Trust Growing Skills
Once skills grow through conversation and also come in from the hub, the next concern is "can we trust that skill itself?" Version 0.16.0 drew lines on both the source and content of skills.
First, on the source side. Alongside openai/skills, anthropics/skills, and huggingface/skills in the default list of places (taps) to fetch skills from, NVIDIA/skills is also included.
Source: hermes-agent/tools/skills_hub.py L395-L413
DEFAULT_TAPS = [
{"repo": "openai/skills", "path": "skills/.curated/"},
{"repo": "openai/skills", "path": "skills/.system/"},
{"repo": "anthropics/skills", "path": "skills/"},
{"repo": "huggingface/skills", "path": "skills/"},
# NVIDIA/skills: NVIDIA-verified skills for CUDA-X, AIQ, cuOpt,
# cuPyNumeric, DeepStream, NeMo, NemoClaw, etc. Each skill ships
# alongside a signed `skill.oms.sig`, an OMS-signed `skill-card.md`
# (governance card), and an `evals/` directory — synced daily from
# the NVIDIA product repos. Treated as `trusted`.
{"repo": "NVIDIA/skills", "path": "skills/"},
{"repo": "garrytan/gstack", "path": ""},
]
It states that NVIDIA's skills are distributed with a signature (skill.oms.sig), a governance card (skill-card.md), and even an evals/ directory for evaluation, synchronized daily from NVIDIA product repositories.
Furthermore, the behavior at install time changes according to the trust level of the source.
Source: hermes-agent/tools/skills_guard.py L40-L61
TRUSTED_REPOS = {
"openai/skills",
"anthropics/skills",
"huggingface/skills",
"NVIDIA/skills",
}
INSTALL_POLICY = {
# safe caution dangerous
"builtin": ("allow", "allow", "allow"),
"trusted": ("allow", "allow", "block"),
"community": ("allow", "block", "block"),
# ...
}
For trusted (the 4 repositories above), caution is permitted but dangerous is stopped; for community, anything beyond safe is blocked. This mechanically changes the handling of skills based on "where they came from."
There are also changes on the content side. When scanning skill content, invisible Unicode is detected as a sign of injection.
Source: hermes-agent/tools/skills_guard.py L537-L555
INVISIBLE_CHARS = {
'\u200b', # zero-width space
'\u200c', # zero-width non-joiner
'\u200d', # zero-width joiner
'\u2060', # word joiner
# ...
'\u202e', # right-to-left override
# ...
'\u2069', # pop directional isolate
}
It's built to catch invisible characters mixed into skills — such as zero-width spaces and bidirectional control characters (bidi override) — as high severity injections. This assumes an attack where invisible instructions are embedded in a skill. The more useful growing skills become, the more this kind of content inspection matters.
Memory and Session Search Have Different Roles
At this point, you may be wondering about the difference between skills and memory. Hermes Agent also has memory and session search, but in the code their roles are separated.
| Mechanism | Role | What is stored |
|---|---|---|
| skill | Reusable procedures | Output formats, investigation procedures, verification checks |
| memory | Stable knowledge recalled every time | Preferences, environment facts, long-lasting premises |
| session search | Search of past work | Conversation logs, reasoning history, cues for resumption |
Looking at memory_tool.py, memory is described as two stores: MEMORY.md and USER.md.
Source: hermes-agent/tools/memory_tool.py L5-L14
Provides bounded, file-backed memory that persists across sessions. Two stores:
- MEMORY.md: agent's personal notes and observations (environment facts, project
conventions, tool quirks, things learned)
- USER.md: what the agent knows about the user (preferences, communication style,
expectations, workflow habits)
Furthermore, it states that both are injected into the system prompt as a frozen snapshot at session start.
Both are injected into the system prompt as a frozen snapshot at session start.
Mid-session writes update files on disk immediately (durable) but do NOT change
the system prompt -- this preserves the prefix cache for the entire session.
In other words, memory can be read not as "a place to remember everything," but as a place to keep small stable premises and preferences that will remain effective from the next session onward.
On the other hand, session search is the layer for searching past logs. The opening of session_search_tool.py states that it uses a SQLite session DB and FTS5 index and returns actual messages without any LLM calls.
Source: hermes-agent/tools/session_search_tool.py L21-L23
All three modes operate on the SQLite session DB via the FTS5 index and
the get_anchored_view / get_messages_around primitives in hermes_state.
No LLM calls anywhere — every shape returns actual messages from the DB.
This is interesting. Rather than constantly putting all past conversations into memory in the prompt, it's designed to search the DB when needed.
Personally, I feel this division of responsibility is quite important. Procedures go in skills, stable knowledge in memory, and past context in session search. By not cramming everything into a single "memory," the place for growing and the place for searching are separated.
If relying on session search, organizing the ever-accumulating logs is also necessary. Running hermes sessions optimize, introduced in 0.16.0, merges FTS5 segments and then performs a VACUUM — maintenance work.
Source: hermes-agent/hermes_state.py L4037-L4049
"""Merge fragmented FTS5 b-tree segments into one per index.
FTS5 indexes grow as a series of incremental segments — one per
``INSERT`` batch driven by the message triggers. Over tens of
thousands of messages these segments accumulate, which both bloats
the ``*_data`` shadow tables and slows ``MATCH`` queries that must
scan every segment. The special ``'optimize'`` command rewrites each
index as a single merged segment.
"""
It's written that this maintenance only fixes the disk layout and search speed without changing search results. Since auto_prune for automatically pruning old sessions after a certain period was also introduced alongside this, it gives the impression that the mechanism for folding what has accumulated is now properly in place on the flip side of the premise of "accumulating sessions and searching them."
Autonomous Operation is Possible, But the Default is Conservative
Up to this point, we've been talking about "what to remember." Reading 0.16.0, there are also updates to another aspect of self-improving: "how to delegate work." However, going deeper requires explicit opt-in, and the defaults were quite conservative.
First, there is delegation, where an agent delegates work to child agents. Child agents are leaves by default and cannot branch further. Only when given the orchestrator role can that child spawn its own grandchild workers.
Source: hermes-agent/tools/delegate_tool.py L1955-L1958
The 'role' parameter controls whether a child can further delegate:
'leaf' (default) cannot; 'orchestrator' retains the delegation
toolset and can spawn its own workers, bounded by
delegation.max_spawn_depth.
Depth is constrained by max_spawn_depth, and the comment says "floor of 1, no ceiling, but default is flat (depth 1)."
Source: hermes-agent/tools/delegate_tool.py L137-L139
# No upper ceiling on spawn depth — like max_concurrent_children, depth has a
# floor of 1 and no ceiling. Deeper trees multiply API cost, so the default
# stays flat (MAX_DEPTH = 1); raising the config knob is an explicit opt-in.
The policy is: deep trees multiply API costs, so the default is shallow, and going deeper is explicit. Personally, I appreciate the balance of being able to run autonomously but keeping the default conservative so it doesn't run amok.
Next, there is progressive tool disclosure for when tools increase. Rather than listing all MCP and plugin tools in the system prompt, they are disclosed on demand.
Source: hermes-agent/tools/tool_search.py L1-L14
"""Progressive tool disclosure ("tool search") for Hermes Agent.
When enabled, MCP and non-core plugin tools are replaced in the model-visible
tools array by three bridge tools — ``tool_search``, ``tool_describe``,
``tool_call`` — and surfaced on demand. Core Hermes tools never defer.
...
* The threshold gate runs every assembly: when deferrable tools would consume
less than ``threshold_pct`` of the model's context window (default 10%),
tool search is a no-op and the tools array passes through unchanged.
"""
Here too, core tools are always shown, and switching to disclosure only happens when tools would consume more than 10% of the context — keeping it to a minimum.
Finally, there is kanban's goal_mode. When a card is set to goal_mode, a worker keeps cycling while judging "whether the goal has been met."
Source: hermes-agent/hermes_cli/goals.py L1-L8
"""Persistent session goals — the Ralph loop for Hermes.
A goal is a free-form user objective that stays active across turns. After
each turn completes, a small judge call asks an auxiliary model "is this
goal satisfied by the assistant's last response?". If not, Hermes feeds a
continuation prompt back into the same session and keeps working until the
goal is done, turn budget is exhausted, the user pauses/clears it, or the
user sends a new message (which takes priority and pauses the goal loop).
"""
This is what's known as the Ralph loop. The termination conditions are explicitly stated as "goal achieved, turn budget exhausted, user stops it, or a new message arrives," so there are safeguards preventing it from running indefinitely. The flow in the NemoHermes demo where tasks are filed in kanban and profiles share them can also be read as an extension of this.
Getting to this point, I feel that self-improving has entered a stage where not only do skills grow, but the way work is delegated and run can also be cultivated. However, every default is flat and conservative, and going deeper requires explicit configuration — there was a common thread of caution throughout.
NemoClaw and OpenShell Protect the Boundaries of a Growing Agent
Mechanisms like Hermes Agent become more useful the more you use them. On the other hand, from an enterprise perspective, it can be concerning to let a convenient agent loose externally without any restrictions.
This is where NemoClaw and OpenShell come in as the Runtime layer.
Looking at NemoClaw's Hermes config, the toolset exposed to the API server includes skills, memory, session_search, and others.
Source: NemoClaw/agents/hermes/config/hermes-config.ts L11-L27
const API_SERVER_TOOLSETS = [
'web',
'browser',
'terminal',
'file',
'code_execution',
'vision',
'image_gen',
'skills',
'todo',
'memory',
'session_search',
'delegation',
'cronjob',
'nemoclaw',
'audio',
];
In other words, even Hermes running on OpenShell is designed with the assumption that it will use skills, memory, and session search as Hermes typically does.
However, external access is constrained by policy. NemoClaw's sandbox policy schema is described as a schema that defines filesystem, process, and network egress rules.
Source: NemoClaw/schemas/sandbox-policy.schema.json L4-L5
{
"title": "NemoClaw Sandbox Policy",
"description": "Schema for the base sandbox policy (openclaw-sandbox.yaml) — defines filesystem, process, and network egress rules."
}
Furthermore, network policy defines which binaries are allowed to reach which endpoints. For REST and WebSocket, rules can also be defined at the path level. The method and path rules are defined at L104-L130 in the same schema.
Looking at the policy additions for Hermes, managed inference only allows OpenAI-compatible paths to inference.local.
Source: NemoClaw/agents/hermes/policy-additions.yaml L44-L61
network_policies:
managed_inference:
name: managed_inference
endpoints:
- host: inference.local
port: 443
protocol: rest
enforcement: enforce
rules:
- allow: { method: POST, path: '/v1/chat/completions' }
- allow: { method: POST, path: '/v1/completions' }
- allow: { method: POST, path: '/v1/embeddings' }
- allow: { method: GET, path: '/v1/models' }
- allow: { method: GET, path: '/v1/models/**' }
binaries:
- { path: /usr/local/bin/hermes }
- { path: /usr/bin/python3.11 }
- { path: /opt/hermes/.venv/bin/python }
This is not about "asking via prompt not to go outside," but rather a runtime policy approach that determines where the agent can reach. The more a self-improving agent learns business procedures and operates autonomously, the more meaningful these runtime boundaries become. The design philosophy behind OpenShell itself is also explained in detail on the NVIDIA Developer Blog.
What to Turn Into a Skill and What to Leave Out Is a Human Decision
Up to this point, we've gone through the code for both the mechanisms that grow the agent and the mechanisms that govern it. Accumulated skills can be pruned with the environments filter and curator, and session logs can be organized with hermes sessions optimize, so much of the operational burden has been shifted to the tooling side.
However, the actual sorting — what to promote to a skill, what to keep in memory, and what to place in project context or a wiki — remains a human responsibility. This feels similar to how in the DevOps world, people distinguish between runbooks, issues, and audit logs. Rather than saying the agent grows on its own, it felt more accurate to say that humans approve good procedures and organize them into a form the agent can reuse.
Summary
Reading the self-improving aspects of Hermes Agent from the code, it looks less like a model retraining itself autonomously and more like a system for preserving business procedures as reusable assets.
Skills are a layer for preserving successful work procedures and output formats as SKILL.md and supporting files. Memory is a layer for injecting preferences and environment information that remain consistently effective in future sessions as frozen snapshots. Session search is a layer for retrieving past conversations and decision rationale from SQLite / FTS when needed — that's the division of roles.
Reading 0.16.0 alongside this reinforces that impression even further. The core mechanism for creating skills remains unchanged, while surrounding it, mechanisms for governing the agent have been added: selecting and collapsing accumulated skills, varying trust based on where a skill came from, and controlling how tasks are delegated. Just as much attention has been given to organizing and safely using what has grown as to the features that enable growth in the first place.
And NemoClaw and OpenShell can be read as the Runtime layer that places such growing agents within the boundaries of a sandbox and policy.
The term "self-evolving" sounds a bit flashy, but as the original Nous team calls it "self-improving," reading the code reveals that the reality is a quite grounded mechanism. Good procedures taught by humans are preserved as skills, and combined with memory and session search for use in future sessions. And that agent runs within the policy boundaries of OpenShell. Personally, I think this direction is the most practical approach for growing agents in an enterprise setting.
Reference Links
- Hermes Agent (NousResearch/hermes-agent) — GitHub Repository
- Hermes Agent v2026.6.5 (0.16.0) Release Notes
- Hermes Agent Official Documentation
- Video "How We Built Self-Evolving Hermes Agents With NVIDIA NemoClaw" (NVIDIA Developer)
- Video "Self-Evolving Hermes Agents: Enterprise AI That Gets Better With Use" (Nemotron Labs)
- NemoClaw (NVIDIA/NemoClaw) — GitHub Repository
- NemoClaw Architecture Reference (NVIDIA Official Documentation)
- OpenShell (NVIDIA/OpenShell) — GitHub Repository
- NVIDIA Developer Blog "Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell"
