I tried reading Hermes Agent's self-improving from the code

For those curious about the term "self-improving" that Hermes Agent touts. Reading through the code, what emerges is not a system where the model spontaneously retrains itself, but rather a mechanism that organizes successful work procedures into reusable assets as skills. Combined with the 0.16.0 update, here is a summary of what a "growing agent" and the mechanisms to govern it actually look like.

森茂洋 / Hiroshi Morishige

2026.06.06

This page has been translated by machine translation. View original

Hello, I'm Morishige from Classmethod's Manufacturing Business Technology Department.

Many of you may be intrigued by the phrase "self-improving agent" that Hermes Agent advocates.

However, it's natural to feel a bit guarded, wondering "will the model retrain itself on its own?" or "will the behavior change without my knowledge?" I had those same concerns at first.

In this article, I'll use the NemoHermes configuration as a subject to clarify what grows in Hermes Agent, drawing from the code and settings. To cut to the chase, what mainly grows is not the model weights but rather skills for reusing business procedures and output formats.

Note that this property of "skills growing with use" is called self-improving by the original Nous Research, and self-evolving by NVIDIA in the context of NemoClaw / OpenShell. Since they refer to the same thing, this article primarily uses self-improving to align with the original, and only writes self-evolving when covering NVIDIA's presentation.

Just as I was writing this, Hermes Agent 0.16.0 (v2026.6.5) was released. Updates had been made to the skill-related areas around "how to consolidate growing skills," "how much to trust growing skills," and "how much to let it run autonomously," so I'll also read through the 0.16.0 code.

What This Article Covers

In this article, I'll read through the Hermes Agent, NemoClaw, and OpenShell repositories published by NVIDIA / Nous Research. The overall picture of NemoClaw is also explained in NVIDIA's NemoClaw Architecture Reference.

I confirmed the following commits locally. The basic structure is read in 0.15.1, and parts added in 0.16.0 (v2026.6.5), released on 2026-06-05, are confirmed in the 0.16.0 code.

repo	confirmed reference	role covered in this article
hermes-agent	`40420a6` (0.15.1)	basic structure of skill, memory, and session search
hermes-agent	`v2026.6.5` / `d6b9cfa` (0.16.0)	skill organization, trust, and autonomy layers added in 0.16.0
NemoClaw	`17734b1`	blueprint and CLI layer for running Hermes on OpenShell
OpenShell	`b41e0df`	Runtime layer responsible for sandbox and policy

What's interesting is that the description of the core tool for creating, fixing, and deleting skills (tools/skill_manager_tool.py) itself hasn't changed from 0.15.1. What was touched in 0.16.0 was the operational side surrounding that core. You could read it as: the mechanism for growing skills is unchanged, while the mechanisms for governing them have increased.

How NVIDIA's Official Video Presents Self-Evolving

This word self-evolving is also explained quite concretely in NVIDIA Developer videos.

The short demo "How We Built Self-Evolving Hermes Agents With NVIDIA NemoClaw" introduces a configuration that runs Hermes inside an NVIDIA OpenShell sandbox and connects it to Slack, Outlook, and GitHub. The center of the demo is a flow where you teach Hermes the output format for a GitHub issue digest through conversation, and Hermes saves that format as a reusable skill, allowing it to return responses in the same format when requested from a different channel.

The longer Nemotron Labs stream "Self-Evolving Hermes Agents: Enterprise AI That Gets Better With Use" covered topics including skill creation, session search, memory, policy-gated sandboxing by OpenShell, token masking, measures against skill bloat, and enterprise deployment considerations.

Viewing NemoHermes in Three Layers

First, NemoHermes becomes easier to read when divided into three layers: Model, Harness, and Runtime.

Nemotron and vLLM are the Model layer, responsible for inference and generation. Hermes Agent appears as the Harness layer, which calls tools and uses skills, memory, and session search. OpenShell is positioned as the Runtime layer that controls which files and networks the agent can access.

The nemohermes command on the NemoClaw side is also easier to understand when viewed through these three layers. Looking at the implementation, nemohermes is defined as a thin entry point that has NemoClaw select the Hermes agent.

Source: NemoClaw/bin/nemohermes.js L6-L9

// NemoHermes — alias for NemoClaw with the Hermes agent pre-selected.
process.env.NEMOCLAW_AGENT = 'hermes';
process.env.NEMOCLAW_INVOKED_AS = 'nemohermes';
module.exports = require('../dist/nemoclaw');

In other words, it's more natural to read NemoHermes not as a separate large agent implementation, but as a configuration where Hermes Agent is selected on top of NemoClaw. On the NemoClaw documentation side as well, nemohermes is described as an alias for nemoclaw with the Hermes agent pre-selected.

The Core of What Grows in Hermes Is the Skill

Now to the main topic.

What appears to be at the center of self-improving on the Hermes Agent side is the skill. The opening comment of skill_manager_tool.py states the role of skills quite clearly.

Source: hermes-agent/tools/skill_manager_tool.py L5-L12

Allows the agent to create, update, and delete skills, turning successful
approaches into reusable procedural knowledge.

In the same file, skills are called the agent's procedural memory and are distinguished from general memory.

Skills are the agent's procedural memory: they capture *how to do a specific
type of task* based on proven experience. General memory (MEMORY.md, USER.md) is
broad and declarative. Skills are narrow and actionable.

Reading this comment makes what grows through self-improving a bit more concrete. Rather than model weights being updated on the spot, it reads more like successful work procedures are preserved in a reusable form.

Hermes user skills are handled as a structure with a SKILL.md and related files under ~/.hermes/skills/.

Source: hermes-agent/tools/skill_manager_tool.py L22-L32

~/.hermes/skills/
├── my-skill/
│   ├── SKILL.md
│   ├── references/
│   ├── templates/
│   ├── scripts/
│   └── assets/

This is different from simple chat history. Since it remains as SKILL.md and supporting files, humans can read it, modify it, or take it to a different environment.

For example, if the morning issue digest has a set of fixed perspectives and output format, instead of explaining it with a long prompt every time, you can preserve it as a skill. Put simply, the image is something like this:

---
name: daily-issue-digest
description: Produce a daily issue digest in the agreed format.
metadata:
  hermes:
    tags: [digest, github]
---

Of course, an actual skill also includes procedures, notes, verification methods, and if necessary, templates and helper scripts. I think it's easier to understand if you see it as a mechanism that moves prompt craftsmanship away from one-time conversations and toward small software assets.

In 0.16.0, Skills Shift from "Keep Adding" to "Select and Consolidate"

Reading the previous chapter, you can see that skills can be added indefinitely. While convenient, skills that keep accumulating create a different problem: "which skill should be used when becomes unclear." Version 0.16.0 addressed this.

First, a mechanism to show different skills depending on the execution environment was introduced. By writing environments: in the skill's frontmatter, the skill only appears in the skill list when that environment is active.

Source: hermes-agent/agent/skill_utils.py L234-L253

"""Return True when the skill is relevant to the current runtime environment.

Skills may declare an ``environments`` list in their YAML frontmatter::

    environments: [kanban]        # only relevant when kanban is active
    environments: [s6]            # only relevant inside the s6 Docker image
    environments: [docker]        # only relevant inside any container

If the field is absent or empty the skill is relevant in **all**
environments (backward-compatible default).

This is an OFFER-time filter: it controls whether a skill shows up in the
skills index / autocomplete / slash-command list.
"""

What I found thoughtful here is that this is only a filter for "whether to show it." The comment also states that it doesn't apply to skill_view or explicit loading via --skills. Explicit loading is explicit consent, so force-load passes through even if hidden by the filter. It's a design that reduces skills irrelevant to the environment from the list, while still ensuring they can always be called when needed.

Next, a curator was introduced to consolidate unused skills. It's a mechanism that archives skills that haven't been used for a certain period, and the key point is that built-in (bundled skills) can also be targeted.

Source: hermes-agent/agent/curator.py L187-L195

"""Whether the curator may prune (archive) bundled built-in skills too.

ON by default. When on, built-ins become curation candidates and are
archived after the same inactivity period as agent-created skills, with a
suppression list keeping them archived across `hermes update` re-seeds.
Hub-installed skills are never pruned regardless of this flag.
"""

Making bundled skills pruning targets by default seems like a bold design choice. It also states that the suppression list keeps consolidated items archived across hermes update re-seeding, so there's a consideration that things once consolidated won't come back on their own. On the other hand, hub-installed skills you added yourself are exempt from pruning, providing peace of mind that what you installed won't disappear.

Furthermore, the direction has also shifted toward starting with a lighter default skill set and adding what you want afterward. Optional skills are fetched with hermes skills install.

Source: hermes-agent/hermes_cli/skills_hub.py L459-L470

def do_install(identifier: str, category: str = "", force: bool = False,
               console: Optional[Console] = None, skip_confirm: bool = False,
               invalidate_cache: bool = True,
               name_override: str = "") -> None:
    """Fetch, quarantine, scan, confirm, and install a skill.
    ...
    """

You can see that the installation flow goes in the order: fetch, quarantine, scan, confirm. Rather than placing the fetched skill immediately, steps are taken to isolate it, scan it, and then install it. The impression is that because skills are expected to "grow and be received," a proper process for the entry point has been established.

How Much to Trust Growing Skills

When skills grow from conversation and also come in from the hub, the next concern is "can we trust the skill itself?" Version 0.16.0 drew lines on both the source and content of skills.

First, the source side. In the default list of skill sources (taps) to fetch from, NVIDIA/skills appears alongside openai/skills, anthropics/skills, and huggingface/skills.

Source: hermes-agent/tools/skills_hub.py L395-L413

DEFAULT_TAPS = [
    {"repo": "openai/skills", "path": "skills/.curated/"},
    {"repo": "openai/skills", "path": "skills/.system/"},
    {"repo": "anthropics/skills", "path": "skills/"},
    {"repo": "huggingface/skills", "path": "skills/"},
    # NVIDIA/skills: NVIDIA-verified skills for CUDA-X, AIQ, cuOpt,
    # cuPyNumeric, DeepStream, NeMo, NemoClaw, etc. Each skill ships
    # alongside a signed `skill.oms.sig`, an OMS-signed `skill-card.md`
    # (governance card), and an `evals/` directory — synced daily from
    # the NVIDIA product repos. Treated as `trusted`.
    {"repo": "NVIDIA/skills", "path": "skills/"},
    {"repo": "garrytan/gstack", "path": ""},
]

NVIDIA's skills are stated to be distributed with a signature (skill.oms.sig), a governance card (skill-card.md), and an evals/ directory for evaluation, synchronized daily from NVIDIA product repositories.

Furthermore, the behavior at installation changes according to the trust level of the source.

Source: hermes-agent/tools/skills_guard.py L40-L61

TRUSTED_REPOS = {
    "openai/skills",
    "anthropics/skills",
    "huggingface/skills",
    "NVIDIA/skills",
}

INSTALL_POLICY = {
    #                  safe      caution    dangerous
    "builtin":       ("allow",  "allow",   "allow"),
    "trusted":       ("allow",  "allow",   "block"),
    "community":     ("allow",  "block",   "block"),
    # ...
}

For trusted (the 4 repositories above), caution is permitted but dangerous is blocked; for community, everything else allows only safe. Skills are mechanically handled differently based on "where they came from."

Changes were also made to the content side. When scanning skill content, invisible Unicode is detected as a sign of injection.

Source: hermes-agent/tools/skills_guard.py L537-L555

INVISIBLE_CHARS = {
    '\u200b',  # zero-width space
    '\u200c',  # zero-width non-joiner
    '\u200d',  # zero-width joiner
    '\u2060',  # word joiner
    # ...
    '\u202e',  # right-to-left override
    # ...
    '\u2069',  # pop directional isolate
}

If invisible characters such as zero-width spaces or bidirectional control characters (bidi override) are mixed into a skill, they are captured as high severity injections. This is designed with attacks in mind where invisible instructions are embedded in skills. The more useful growing skills become, the more this kind of content inspection matters.

Memory and Session Search Have Different Roles

This raises the question of the difference between skills and memory. Hermes Agent also has memory and session search, but in the code, each has a separate role.

Mechanism	Role	What's stored
skill	Procedures to reuse	Output formats, research procedures, verification checks
memory	Stable knowledge recalled every time	Preferences, environment facts, long-standing premises
session search	Search of past work	Conversation logs, reasoning behind decisions, hints for resuming

Looking at memory_tool.py, memory is described as two stores: MEMORY.md and USER.md.

Source: hermes-agent/tools/memory_tool.py L5-L14

Provides bounded, file-backed memory that persists across sessions. Two stores:
  - MEMORY.md: agent's personal notes and observations (environment facts, project
    conventions, tool quirks, things learned)
  - USER.md: what the agent knows about the user (preferences, communication style,
    expectations, workflow habits)

Furthermore, it states that both are injected into the system prompt as a frozen snapshot at session start.

Both are injected into the system prompt as a frozen snapshot at session start.
Mid-session writes update files on disk immediately (durable) but do NOT change
the system prompt -- this preserves the prefix cache for the entire session.

In other words, memory can be read not as "a place to remember everything," but as a place to keep small, stable premises and preferences that remain effective in subsequent sessions.

Session search, on the other hand, is the past log search layer. The opening of session_search_tool.py states that it uses a SQLite session DB and FTS5 index to return actual messages without LLM calls.

Source: hermes-agent/tools/session_search_tool.py L21-L23

All three modes operate on the SQLite session DB via the FTS5 index and
the get_anchored_view / get_messages_around primitives in hermes_state.
No LLM calls anywhere — every shape returns actual messages from the DB.

This is interesting. Rather than always including past conversations in the prompt as memory, it's a design that searches the DB when needed.

Personally, I felt this division of responsibility is quite important. Procedures go in skills, stable knowledge in memory, and past context in session search. By not cramming everything into a single "memory," the place for growing and the place for searching are kept separate.

If relying on session search, organizing the ever-accumulating logs is also necessary. The hermes sessions optimize command introduced in 0.16.0 is a maintenance operation that merges FTS5 segments and then runs VACUUM.

Source: hermes-agent/hermes_state.py L4037-L4049

"""Merge fragmented FTS5 b-tree segments into one per index.

FTS5 indexes grow as a series of incremental segments — one per
``INSERT`` batch driven by the message triggers. Over tens of
thousands of messages these segments accumulate, which both bloats
the ``*_data`` shadow tables and slows ``MATCH`` queries that must
scan every segment. The special ``'optimize'`` command rewrites each
index as a single merged segment.
"""

It states that this maintenance only fixes the on-disk layout and search speed, without changing the search results. Since auto_prune for automatically pruning old sessions after a certain period was also included, the impression is that the mechanism for consolidating accumulated items is now properly in place behind the premise of "accumulate sessions and search them."

Can Run Autonomously, But Default Is Conservative

Up to here, the discussion has been about "what to remember." Reading 0.16.0, there are also updates to another aspect of self-improving: "how to delegate work." However, going further requires explicit opt-in, and the defaults are quite conservative.

First, there is delegation, where the agent delegates work to child agents. Child agents are leaf by default and cannot branch further. Only when given the orchestrator role can that child spawn grandchild workers.

Source: hermes-agent/tools/delegate_tool.py L1955-L1958

The 'role' parameter controls whether a child can further delegate:
'leaf' (default) cannot; 'orchestrator' retains the delegation
toolset and can spawn its own workers, bounded by
delegation.max_spawn_depth.

Depth is constrained by max_spawn_depth, and the comment states "minimum 1, no maximum, but default is flat (depth 1)."

Source: hermes-agent/tools/delegate_tool.py L137-L139

# No upper ceiling on spawn depth — like max_concurrent_children, depth has a
# floor of 1 and no ceiling. Deeper trees multiply API cost, so the default
# stays flat (MAX_DEPTH = 1); raising the config knob is an explicit opt-in.

The policy is that deeper trees multiply API costs, so the default stays shallow, and going deeper is explicit. I personally appreciate the balance of enabling autonomous operation while keeping defaults conservative to prevent runaway behavior.

Next is progressive tool disclosure when tools increase. Rather than listing all MCP and plugin tools in the system prompt, they are disclosed on demand.

Source: hermes-agent/tools/tool_search.py L1-L14

"""Progressive tool disclosure ("tool search") for Hermes Agent.

When enabled, MCP and non-core plugin tools are replaced in the model-visible
tools array by three bridge tools — ``tool_search``, ``tool_describe``,
``tool_call`` — and surfaced on demand. Core Hermes tools never defer.
...
* The threshold gate runs every assembly: when deferrable tools would consume
  less than ``threshold_pct`` of the model's context window (default 10%),
  tool search is a no-op and the tools array passes through unchanged.
"""

Here too, core tools are always shown, and switching to disclosure only happens when tools would exceed 10% of the context—keeping it to a minimum.

Finally, there is goal_mode for the kanban. When a card is set to goal_mode, workers keep cycling while judging "whether the goal has been satisfied."

Source: hermes-agent/hermes_cli/goals.py L1-L8

"""Persistent session goals — the Ralph loop for Hermes.

A goal is a free-form user objective that stays active across turns. After
each turn completes, a small judge call asks an auxiliary model "is this
goal satisfied by the assistant's last response?". If not, Hermes feeds a
continuation prompt back into the same session and keeps working until the
goal is done, turn budget is exhausted, the user pauses/clears it, or the
user sends a new message (which takes priority and pauses the goal loop).
"""

This is what's known as a Ralph loop. The termination conditions are explicitly stated as "goal achieved, turn budget exhausted, user stops it, or a new message arrives," with guardrails preventing it from running indefinitely. The flow in the NemoHermes demo where tasks are filed to the kanban and profiles take on shares can also be read as an extension of this.

By this point, I feel that self-improving has entered a stage where not only skills grow, but also the ways of delegating and running work can be cultivated. However, all of these default to flat and conservative, with going further requiring explicit configuration—a common cautious stance throughout.

NemoClaw and OpenShell Protect the Boundaries of a Growing Agent

Mechanisms like the Hermes Agent become more useful the more you use them. On the other hand, from an enterprise perspective, it's understandably concerning to release a convenient agent to the outside world without any restrictions.

This is where NemoClaw and OpenShell come in as the Runtime layer.

Looking at the Hermes config in NemoClaw, the toolsets exposed to the API server include skills, memory, session_search, and others.

Source: NemoClaw/agents/hermes/config/hermes-config.ts L11-L27

const API_SERVER_TOOLSETS = [
  'web',
  'browser',
  'terminal',
  'file',
  'code_execution',
  'vision',
  'image_gen',
  'skills',
  'todo',
  'memory',
  'session_search',
  'delegation',
  'cronjob',
  'nemoclaw',
  'audio',
];

This means that even Hermes running on OpenShell is designed to use the skills, memory, and session search that are characteristic of Hermes.

However, external access is restricted by policy. The sandbox policy schema in NemoClaw is described as a schema that defines filesystem, process, and network egress rules.

Source: NemoClaw/schemas/sandbox-policy.schema.json L4-L5

{
  "title": "NemoClaw Sandbox Policy",
  "description": "Schema for the base sandbox policy (openclaw-sandbox.yaml) — defines filesystem, process, and network egress rules."
}

Furthermore, the network policy defines which endpoints each binary is allowed to reach. For REST and WebSocket, it can also hold path-level rules. The method and path rules are defined in the same schema at L104-L130.

Looking at the policy additions for Hermes, managed inference only allows OpenAI-compatible paths to inference.local.

Source: NemoClaw/agents/hermes/policy-additions.yaml L44-L61

network_policies:
  managed_inference:
    name: managed_inference
    endpoints:
      - host: inference.local
        port: 443
        protocol: rest
        enforcement: enforce
        rules:
          - allow: { method: POST, path: '/v1/chat/completions' }
          - allow: { method: POST, path: '/v1/completions' }
          - allow: { method: POST, path: '/v1/embeddings' }
          - allow: { method: GET, path: '/v1/models' }
          - allow: { method: GET, path: '/v1/models/**' }
    binaries:
      - { path: /usr/local/bin/hermes }
      - { path: /usr/bin/python3.11 }
      - { path: /opt/hermes/.venv/bin/python }

This is not an approach of "asking nicely via prompt to not go outside," but rather a runtime policy that determines where the agent is allowed to reach. The more a self-improving agent learns business procedures and operates autonomously, the more meaningful these runtime boundaries become. The design philosophy behind OpenShell itself is also explained in detail on the NVIDIA Developer Blog.

What to Make a Skill and What to Leave Out Is a Human Decision

Up to this point, we have gone through the code for both the mechanisms for nurturing and the mechanisms for governing. The tooling now handles much of the operational burden: accumulated skills can be folded down with the environments filter and curator, and session logs can be organized with hermes sessions optimize.

However, the sorting itself—deciding what to promote to a skill, what to retain in memory, and what to place in project context or a wiki—remains a human responsibility. This feels similar to the DevOps practice of distinguishing between runbooks, issues, and audit logs. Rather than saying the agent grows on its own, it felt more accurate to say that humans approve good procedures, and then organize them into a form the agent can reuse.

Summary

Reading the self-improving aspects of Hermes Agent through the code, it looked less like a model autonomously retraining itself and more like a system for preserving business procedures as reusable assets.

Skills are the layer that retains successful work procedures and output formats as SKILL.md and supporting files. Memory is the layer that injects preferences and environment information that should reliably take effect in future sessions as frozen snapshots. Session search is the layer that retrieves past conversations and decision rationales from SQLite / FTS when needed—that is the division of responsibilities.

Reading through 0.16.0 alongside this only strengthened that impression. The core mechanism for creating skills remains in place, while surrounding mechanisms have been added to "select and fold accumulated skills," "vary trust based on where a skill came from," and "control how work is delegated." Just as much attention has been given to organizing and safely using what has grown as to the features that enable growth in the first place.

And NemoClaw and OpenShell can be read as a Runtime layer that places such growing agents within the boundaries of a sandbox and policy.

The phrase "self-evolving" sounds a bit flashy, but as the original Nous calls it "self-improving," reading the code reveals a surprisingly grounded mechanism. Good procedures taught by humans are preserved as skills, combined with memory and session search, and used in subsequent sessions. And that agent runs within the policy boundaries of OpenShell. Personally, I think this direction is the most realistic approach for nurturing agents in an enterprise setting.

I tried reading Hermes Agent's self-improving from the code

What This Article Covers

How NVIDIA's Official Video Presents Self-Evolving

Viewing NemoHermes in Three Layers

The Core of What Grows in Hermes Is the Skill

In 0.16.0, Skills Shift from "Keep Adding" to "Select and Consolidate"

How Much to Trust Growing Skills

Memory and Session Search Have Different Roles

Can Run Autonomously, But Default Is Conservative

NemoClaw and OpenShell Protect the Boundaries of a Growing Agent

What to Make a Skill and What to Leave Out Is a Human Decision

Summary

Reference Links

AI白書2026 配布中

AWS Topics

Trending Topics

Products & Services

Features and Series