Build a Mini
Hermes Agent
From Scratch
Contents
- Your First Agent Loop: Message In, Tool Call Out
- The Tool Registry: Teaching Your Agent to Act
- Prompt Construction: Assembling the Brain
- Prompt Caching: Keeping Costs Under Control
- Session Memory: SQLite + FTS5
- Persistent Memory: Who Is This User?
- Cross-Session Recall: Finding the Right Memory
- The Skill System: Folders, Progressive Disclosure, and skill_manage
- Autonomous Skill Creation
- Skill Self-Improvement: The Feedback Loop
Understanding the Machine
01 What Hermes Actually Is
Before you write a single line of code, you need to understand what makes Hermes different from every other AI wrapper.
Most AI chat tools are stateless. You open a conversation, do some work, close it. Next time you open one, it starts from scratch. The model has no idea what you did yesterday.
Hermes Agent, built by Nous Research, takes a fundamentally different approach. It is an autonomous, self-improving agent that runs in the background on your own server. Three properties make it distinct:
- It remembers across sessions. Three layers of memory (session, persistent, skill) mean it knows what happened last week, who you are, and how to do things it learned from past work.
- It improves itself. After completing complex tasks, it offers to save reusable procedures as Skill files (confirming with you first). Background nudges periodically review conversations for things worth remembering. When you give feedback, it updates existing Skills. The more you use it, the better it gets.
- It runs without you. Deploy it on a $5 VPS, connect it to Telegram, and it works 24/7. Cron jobs, scheduled tasks, background monitoring. You check in when you want to.
The concept comes from Harness Engineering, a methodology coined by Mitchell Hashimoto (creator of Terraform). His insight: every time an AI makes a mistake, add a rule so it never makes the same mistake again. Over weeks, the AI accumulates enough rules to behave like a veteran team member.
Hashimoto did this manually, editing CLAUDE.md files by hand. Hermes automates the entire process. The agent observes its own performance, curates its own memory, and writes its own rules.
What we are NOT building
The real Hermes has 40+ tools, MCP integration, 14 platform gateways, sub-agent delegation, and Honcho user modeling. We skip all of that. Our minimal version focuses on the four mechanisms that make Hermes interesting:
| Mechanism | What it does | Chapter |
|---|---|---|
| Agent Loop | Take user input, call tools, return results | 03-05 |
| Three-Layer Memory | Remember across sessions | 06-08 |
| Skill System | Store and retrieve procedural knowledge | 09-11 |
| Learning Loop | Wire memory + skills into self-improvement | 12 |
By the end, you will understand every component well enough to extend it yourself.
02 The Architecture at 10,000 Feet
Five components, one data flow. This is the entire system.
The diagram shows the complete data flow of a single turn. Here is what happens at each stage:
- User sends a message. Text comes in from CLI, Telegram, or any other interface.
- Prompt Builder assembles the system prompt. It pulls in the agent's identity, relevant memories from the database, matching skill files, and any context files. This assembled prompt is what the LLM actually sees.
- LLM generates a response. The response is either a text message (done) or one or more tool calls (continue looping).
- Tools execute and results feed back. The tool result is appended to the conversation, and the LLM is called again. This loop repeats until the model produces a final text response.
- Retrospective fires. After the task completes, the system evaluates the conversation. Should anything be remembered? Should a new Skill be created? Should an existing Skill be updated? This is the Learning Loop that makes the agent self-improving.
The dashed lines at the bottom are what make Hermes different from a normal chatbot. Those feedback paths close the loop, turning a stateless conversation into a system that accumulates knowledge.
Directory structure of our minimal agent
Everything lives under one directory. Memory, skills, config, database. When you want to migrate, back up this folder. When you want to start fresh, delete it.
The Agent Loop
03 Your First Agent Loop
A complete agent in under 100 lines. The loop is simple; what you feed into it is what makes it smart.
The core of any LLM agent is the conversation loop. You send messages to an LLM API, the model responds with either text or tool calls, you execute the tools and feed results back, and repeat until the model emits a final text response.
Here is the minimal loop, stripped to its essence:
import json
from openai import OpenAI
from tool_calling import strategy_for_model
class Agent:
def __init__(self, client, model, system_prompt, tools, tool_handlers):
self.client = client
self.model = model
self.tools = tools # OpenAI tool schemas
self.tool_handlers = tool_handlers # name -> callable
self._strategy = strategy_for_model(model) # structured or text
self.messages = [
{"role": "system", "content": system_prompt}
]
def run(self, user_input: str) -> str:
"""Send user message, loop through tool calls, return final text."""
self.messages.append({"role": "user", "content": user_input})
max_iterations = 15
for _ in range(max_iterations):
response = self._call_llm()
msg = response.choices[0].message
# Strategy parses the response uniformly
content, tool_calls = self._strategy.parse_response(msg)
# Build and append assistant message
assistant_msg = self._strategy.build_assistant_msg(content, tool_calls)
self.messages.append(assistant_msg)
# If no tool calls, we're done
if not tool_calls:
return content
# Execute each tool call
for tc in tool_calls:
result = self._execute_tool(tc.name, tc.arguments)
# Strategy builds the right result message format
result_msg = self._strategy.build_tool_result_msg(tc, result)
self.messages.append(result_msg)
return "[Max iterations reached]"
def _call_llm(self):
# Strategy decides whether to pass tools as API param
# or inject them into the system prompt
kwargs = {"model": self.model, "messages": self.messages, "max_tokens": 400}
kwargs = self._strategy.prepare_kwargs(kwargs, self.tools)
return self.client.chat.completions.create(**kwargs)
def _execute_tool(self, name: str, args: dict) -> str:
handler = self.tool_handlers.get(name)
if not handler:
return f"Error: unknown tool '{name}'"
try:
result = handler(**args)
return str(result)[:50000] # Truncate large outputs
except Exception as e:
return f"Error executing {name}: {e}"
That is the entire agent loop. Every LLM agent, from Claude Code to Hermes to OpenAI's Codex, is fundamentally this same pattern: call LLM, parse tool calls, execute tools, loop. Notice the loop never touches tool-call formats directly; the _strategy object handles all the format-specific logic. This means the same loop works with both structured (OpenAI-style) and text-based (Gemma, LLaMA) tool calling.
The message list is everything
The self.messages list is the most important data structure in the system. It is the agent's entire working memory for the current session. Every user message, every assistant response, every tool call and result gets appended here.
The OpenAI-compatible message format looks like this:
| Role | Purpose | Key Fields |
|---|---|---|
system | Agent identity and instructions | content |
user | Human input | content |
assistant | Model response | content, tool_calls |
tool | Result of a tool call | tool_call_id, content |
This format is a de facto standard. It works with OpenAI, Anthropic (via OpenRouter), DeepSeek, and most open-source model APIs. Our minimal agent uses it throughout.
Why max_iterations matters
Without a cap, a confused model could loop forever, burning tokens. Hermes uses a budget system: each turn costs an iteration, and the agent stops when the budget is exhausted. Our minimal version uses a simple counter of 20, which is enough for most tasks.
IterationBudget class with token tracking. Parallel tool batch detection via _should_parallelize_tool_batch(). Multi-provider support (OpenAI, Anthropic, Codex Responses API). Prompt caching. Reasoning field preservation.
run_agent.py lines 7506-8600 (run_conversation())Tool-calling strategies: structured vs. text
The loop above assumes the model returns structured tool_calls objects via the API. But many open models (Gemma, LLaMA, Phi) emit tool calls as text in the response body instead. Rather than a brittle fallback parser, we solve this with a strategy pattern: the agent delegates three decisions to a swappable strategy object:
| Decision | StructuredStrategy | TextStrategy |
|---|---|---|
| How tools are presented | Pass tools API parameter | Inject tool descriptions into system prompt |
| How tool calls are parsed | Read msg.tool_calls | Regex-parse text content |
| How results are fed back | role: "tool" with tool_call_id | role: "user" with [Tool Result: name] prefix |
The strategy is selected automatically based on the model name:
from tool_calling import ToolCallingStrategy, strategy_for_model
class Agent:
def __init__(self, client, model, ...):
self._strategy = strategy_for_model(model) # picks based on model name
def run(self, user_input):
...
for _ in range(max_iterations):
response = self._call_llm()
msg = response.choices[0].message
# Strategy handles parsing uniformly
content, tool_calls = self._strategy.parse_response(msg)
assistant_msg = self._strategy.build_assistant_msg(content, tool_calls)
self.messages.append(assistant_msg)
if not tool_calls:
return content
for tc in tool_calls:
result = self._execute_tool(tc.name, tc.arguments)
# Strategy builds the right message format
result_msg = self._strategy.build_tool_result_msg(tc, result)
self.messages.append(result_msg)
The text strategy parses several common formats that local models produce:
# Format 1: XML-style tags (Gemma, ChatML)
<tool_call>{"name": "terminal", "arguments": {"command": "ls"}}</tool_call>
# Format 2: call:name{json} (some Gemma variants)
call:terminal{command: "ls -la"}
# Format 3: Fenced JSON blocks
```json
{"name": "file_write", "arguments": {"path": "out.txt", "content": "hello"}}
```
When you switch models via /model in the CLI, the strategy updates automatically. This means you can chat with a structured model like Qwen, switch to Gemma mid-session, and tool calling keeps working.
TextStrategy, not touching the agent loop.
04 The Tool Registry
Tools turn your agent from a chatbot into something that can actually do things.
An agent without tools is just a chatbot. The tool registry is the mechanism that lets you define what actions the agent can take, expose them to the LLM in the right format, and dispatch calls to the right handler.
The registry pattern
from dataclasses import dataclass, field
from typing import Callable, Any
@dataclass
class ToolEntry:
name: str
description: str
parameters: dict # JSON Schema for the function args
handler: Callable
category: str = "general"
class ToolRegistry:
def __init__(self):
self._tools: dict[str, ToolEntry] = {}
def register(self, name, description, parameters, handler, category="general"):
self._tools[name] = ToolEntry(
name=name,
description=description,
parameters=parameters,
handler=handler,
category=category,
)
def get_schemas(self, categories=None) -> list[dict]:
"""Return OpenAI-compatible tool schemas."""
tools = self._tools.values()
if categories:
tools = [t for t in tools if t.category in categories]
return [
{
"type": "function",
"function": {
"name": t.name,
"description": t.description,
"parameters": t.parameters,
},
}
for t in tools
]
def execute(self, name: str, args: dict) -> str:
entry = self._tools.get(name)
if not entry:
raise ValueError(f"Unknown tool: {name}")
return str(entry.handler(**args))
# Global registry instance
registry = ToolRegistry()
Registering tools
Each tool is a Python function with a JSON Schema describing its parameters. Here is a minimal terminal tool:
import subprocess
from tool_registry import registry
def run_terminal(command: str, timeout: int = 30) -> str:
"""Execute a shell command and return stdout + stderr."""
try:
result = subprocess.run(
command, shell=True, capture_output=True,
text=True, timeout=timeout
)
output = result.stdout + result.stderr
return output.strip() or "(no output)"
except subprocess.TimeoutExpired:
return "Error: command timed out"
registry.register(
name="terminal",
description="Run a shell command. Use for git, file ops, builds, etc.",
parameters={
"type": "object",
"properties": {
"command": {"type": "string", "description": "Shell command to execute"},
"timeout": {"type": "integer", "description": "Timeout in seconds", "default": 30},
},
"required": ["command"],
},
handler=run_terminal,
category="execution",
)
Toolsets: controlling what the agent can access
ToolRegistry singleton with lazy module imports, check_fn for availability gates, max_result_size truncation, and composable toolsets that can include other toolsets. MCP tools auto-discovered at startup. 40+ built-in tools across 5 categories.
tools/registry.py, model_tools.py, toolsets.pyHermes groups tools into toolsets (categories like "web", "terminal", "file", "memory"). You enable or disable entire toolsets in the config. This matters for two reasons:
Fewer tools = better decisions. An LLM with 100 tools available makes worse tool-selection choices than one with 10. Only expose what the task needs.
Toolsets are security boundaries. A research sub-agent should have web access but not terminal access. A coding sub-agent needs terminal but not web. The category field on each tool entry makes this filtering trivial.
05 Prompt Construction
The system prompt is the single biggest lever you have. It determines what kind of agent you get.
In Hermes, the system prompt is built once per session and then frozen. It is not reassembled on every turn. This frozen snapshot is reused across all API calls in the session, which is critical for prompt caching (Chapter 6). The prompt is only rebuilt after context compression events.
class PromptBuilder:
def build(self, memory_block: str, skills_index: str, user_context: str) -> str:
sections = []
# 1. Core identity
sections.append(self.IDENTITY)
# 2. Persistent memory (who is this user, what do I know)
if memory_block:
sections.append(f"## What I Remember\n{memory_block}")
# 3. Available skills (what procedures do I know)
if skills_index:
sections.append(f"## Available Skills\n{skills_index}")
# 4. User context files (project-specific info)
if user_context:
sections.append(f"## Project Context\n{user_context}")
# 5. Behavioral guidance
sections.append(self.MEMORY_GUIDANCE)
sections.append(self.SKILLS_GUIDANCE)
sections.append(self.TOOL_USE_GUIDANCE)
return "\n\n".join(sections)
IDENTITY = """You are a helpful AI assistant with persistent memory \
and self-improving skills. You remember past conversations and learn \
from experience. Use your tools to accomplish tasks."""
MEMORY_GUIDANCE = """## Memory Instructions
After completing tasks, actively decide what's worth remembering:
- User preferences and habits
- Project context and architecture decisions
- Solutions to problems that might recur
Use the memory tool to persist important observations."""
SKILLS_GUIDANCE = """## Skill Instructions
After difficult or iterative tasks, offer to save as a skill. \
Confirm with the user before creating or deleting. Use the \
skill_manage tool with action="create" for new skills, \
action="patch" (old_string/new_string) to fix existing ones.
Skip for simple one-offs."""
TOOL_USE_GUIDANCE = """## Tool Use
Take action. Don't just describe what you would do - actually do it. \
If the user asks you to write code, write the file. If they ask you \
to run something, run it. Prefer action over explanation."""
The real Hermes has extensive prompt sections. The three guidance blocks above are the critical ones that drive the self-improvement behavior:
| Guidance | What it steers |
|---|---|
| MEMORY_GUIDANCE | Tells the agent when and what to save to persistent memory |
| SKILLS_GUIDANCE | Tells the agent to offer saving reusable procedures, confirm before creating |
| TOOL_USE_GUIDANCE | Prevents the agent from just planning without acting |
_cached_system_prompt. Continuing sessions load the stored prompt from session DB instead of rebuilding (preserves Anthropic prefix cache). Whitespace normalization for KV-cache consistency. Skill body injection. Tool-use enforcement guidance.
agent/prompt_builder.pyFrozen snapshot vs. per-turn injection
There are two paths for memory to reach the LLM:
| Source | Where it goes | When it updates |
|---|---|---|
| Built-in memory (MEMORY.md, USER.md) | System prompt (frozen snapshot) | Once per session; only rebuilt after compression |
| External memory providers (Honcho, etc.) | Injected into user message per turn | Every turn via prefetch_all() |
The built-in memory is a frozen snapshot: it is read from disk when the session starts and baked into the system prompt. Even if the agent writes new observations during the session (via the memory tool), those writes go to disk but do not appear in the system prompt until the next session (or after compression rebuilds it). This is a deliberate choice for prompt caching stability.
External memory providers can inject per-turn context into the user message, but this is optional and only applies if a provider like Honcho is configured.
For our minimal build, the simple approach works: load MEMORY.md at session start, bake it into the system prompt, do not touch it again until the session ends.
06 Prompt Caching
Every token you re-send costs money. Prompt caching is how Hermes keeps multi-turn conversations affordable.
In a 20-turn conversation, the system prompt (identity + memory + skills) is sent with every API call. Without caching, you pay for those tokens 20 times. Anthropic's prompt caching lets you mark message boundaries as cache breakpoints. Cached prefixes cost ~90% less on subsequent requests.
Hermes implements a strategy called system_and_3: it places cache breakpoints on the system prompt (stable across all turns) plus the last 3 non-system messages (a rolling window). Anthropic allows a maximum of 4 breakpoints, so this uses all of them.
import copy
def apply_prompt_caching(messages, cache_ttl="5m"):
"""Apply system_and_3 caching: system prompt + last 3 messages."""
messages = copy.deepcopy(messages)
marker = {"type": "ephemeral"}
breakpoints_used = 0
# 1. Cache the system prompt (stable across all turns)
if messages[0].get("role") == "system":
_mark_message(messages[0], marker)
breakpoints_used += 1
# 2-4. Cache the last 3 non-system messages (rolling window)
remaining = 4 - breakpoints_used
non_sys = [i for i in range(len(messages))
if messages[i].get("role") != "system"]
for idx in non_sys[-remaining:]:
_mark_message(messages[idx], marker)
return messages
def _mark_message(msg, marker):
"""Add cache_control to a message, handling string and list content."""
content = msg.get("content")
if isinstance(content, str):
# Convert to content block format for cache_control
msg["content"] = [
{"type": "text", "text": content, "cache_control": marker}
]
elif isinstance(content, list) and content:
content[-1]["cache_control"] = marker
else:
msg["cache_control"] = marker
Why this constrains the architecture
Prompt caching is not just an optimization. It constrains how you build the system prompt:
| Design constraint | Why it matters for caching |
|---|---|
| System prompt must be stable | If it changes every turn, the cache is invalidated and you pay full price. Memory and skills content must stay constant within a session. |
| Whitespace must be normalized | Even a trailing space change invalidates the cache. Hermes normalizes whitespace before every API call for KV-cache consistency. |
| Ephemeral context goes in user messages | Prefetched memories are injected into the user message, not the system prompt, because user messages rotate out of the cache window naturally. |
| Deep copy before marking | Cache markers modify the message structure (string to content-block array). The original messages must be preserved for session persistence. |
system_and_3 strategy. Single TTL. Works with OpenRouter Anthropic models.
5m and 1h TTL. Native Anthropic adapter for direct API calls (different cache_control placement). Provider detection to skip caching for non-Anthropic models. Applied in _prepare_api_messages() just before the API call.
agent/prompt_caching.py, agent/anthropic_adapter.pyMemory That Lasts
07 Session Memory: SQLite + FTS5
Every conversation gets recorded with full-text search. This is the agent's episodic memory.
Session memory answers the question: what happened? It records every conversation turn in a SQLite database with FTS5 full-text indexing. Think of it as the agent's conversation diary.
Why SQLite + FTS5
Most AI tools either forget everything (stateless) or dump everything into the context window (expensive and slow). Hermes uses on-demand retrieval: store everything, search when needed, inject only what is relevant.
| Approach | Load Everything | On-Demand (Hermes) |
|---|---|---|
| Context usage | Grows linearly | Essentially constant |
| Precision | Everything is there but nothing is findable | Keyword matching, precise |
| Long-term viability | Breaks after a few days | Works for months |
| Response speed | Slows over time | Stays the same |
FTS5 is SQLite's built-in full-text search extension. No extra database to install. All data lives in a local file. No network dependency, no privacy concerns.
The schema
import sqlite3
import uuid
from datetime import datetime
class SessionDB:
def __init__(self, db_path: str):
self.db_path = db_path
self.conn = sqlite3.connect(db_path)
self.conn.execute("PRAGMA journal_mode=WAL") # Concurrent reads
self._create_tables()
def _create_tables(self):
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
source TEXT DEFAULT 'cli',
started_at REAL,
ended_at REAL,
message_count INTEGER DEFAULT 0,
summary TEXT
);
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT REFERENCES sessions(id),
role TEXT NOT NULL,
content TEXT,
tool_name TEXT,
tool_call_id TEXT,
timestamp REAL
);
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts
USING fts5(content, content_rowid='id');
""")
self.conn.commit()
def create_session(self, source="cli") -> str:
session_id = str(uuid.uuid4())
self.conn.execute(
"INSERT INTO sessions (id, source, started_at) VALUES (?, ?, ?)",
(session_id, source, datetime.now().timestamp())
)
self.conn.commit()
return session_id
def append_message(self, session_id, role, content,
tool_name=None, tool_call_id=None):
cursor = self.conn.execute(
"""INSERT INTO messages
(session_id, role, content, tool_name, tool_call_id, timestamp)
VALUES (?, ?, ?, ?, ?, ?)""",
(session_id, role, content, tool_name, tool_call_id,
datetime.now().timestamp())
)
# Index in FTS5 (only user and assistant messages)
if role in ("user", "assistant") and content:
self.conn.execute(
"INSERT INTO messages_fts (rowid, content) VALUES (?, ?)",
(cursor.lastrowid, content)
)
self.conn.commit()
def search(self, query: str, limit: int = 20) -> list[dict]:
"""Full-text search across all sessions."""
sanitized = self._sanitize_fts_query(query)
rows = self.conn.execute("""
SELECT m.session_id, m.role,
snippet(messages_fts, 0, '>>>', '<<<', '...', 40) as snippet,
s.source, s.started_at
FROM messages_fts
JOIN messages m ON m.id = messages_fts.rowid
JOIN sessions s ON s.id = m.session_id
WHERE messages_fts MATCH ?
ORDER BY rank
LIMIT ?
""", (sanitized, limit)).fetchall()
return [
{"session_id": r[0], "role": r[1], "snippet": r[2],
"source": r[3], "date": datetime.fromtimestamp(r[4]).isoformat()}
for r in rows
]
def _sanitize_fts_query(self, query: str) -> str:
"""Clean user input for FTS5 safety."""
# Remove special FTS5 operators that could cause errors
for char in ['"', '*', '+', '-', '(', ')', ':']:
query = query.replace(char, ' ')
# Split into words, wrap each as a prefix match
words = [w.strip() for w in query.split() if w.strip()]
return ' '.join(words) if words else '""'
Two important implementation details:
WAL mode (PRAGMA journal_mode=WAL) enables concurrent readers with a single writer. This matters when the gateway process is handling messages from multiple platforms while the agent is also writing to the database.
FTS5 query sanitization is essential. User queries can contain special characters that break FTS5 syntax. The sanitizer strips operators and treats input as plain keyword search.
_execute_write(). Stores tool_calls, reasoning fields, and Codex reasoning items. Session metadata tracks token counts, estimated cost, and parent session IDs for sub-agent continuations.
hermes_state.py (SessionDB class, ~1239 lines)08 Persistent Memory
Session memory records what happened. Persistent memory records who you are and what matters.
Persistent memory answers the question: who is this user? It stores durable facts distilled from conversations: coding preferences, commonly used tools, project context, work habits. These persist across sessions and are loaded at startup.
In Hermes, this is implemented as two simple markdown files:
MEMORY.md: Durable facts and observations (default limit: 2,200 characters)USER.md: User profile and preferences (default limit: 1,375 characters)
Both files are loaded as a frozen snapshot when a session starts and baked into the system prompt. Writes during the session go to disk but do not update the running prompt until the next session or after context compression rebuilds it. This is critical for prompt caching (see Chapter 6).
from pathlib import Path
class PersistentMemory:
MEMORY_LIMIT = 2200 # Hermes default for observations
USER_LIMIT = 1375 # Hermes default for user profile
def __init__(self, data_dir: Path):
self.memory_path = data_dir / "MEMORY.md"
self.user_path = data_dir / "USER.md"
# Create files if they don't exist
self.memory_path.touch(exist_ok=True)
self.user_path.touch(exist_ok=True)
def load(self) -> str:
"""Load both files as a combined context block."""
parts = []
memory = self.memory_path.read_text().strip()
user = self.user_path.read_text().strip()
if user:
parts.append(f"### User Profile\n{user}")
if memory:
parts.append(f"### Observations\n{memory}")
return "\n\n".join(parts)
def save_observation(self, text: str):
"""Append an observation, respecting the size limit."""
current = self.memory_path.read_text()
new_entry = f"\n- {text}"
if len(current) + len(new_entry) > self.MEMORY_LIMIT:
lines = current.strip().split("\n")
while lines and len("\n".join(lines)) + len(new_entry) > self.MEMORY_LIMIT:
lines.pop(0)
current = "\n".join(lines)
self.memory_path.write_text(current + new_entry)
def update_user_profile(self, text: str):
"""Replace the user profile."""
self.user_path.write_text(text[:self.USER_LIMIT])
The size limits (2,200 for memory, 1,375 for user profile) are deliberate. This content is frozen into the system prompt at session start, so it consumes tokens on every API call. The combined ~3,575 characters keeps the overhead predictable while still being useful.
What belongs in persistent memory
| Save | Do NOT save |
|---|---|
| User preferences (code style, editor, OS) | One-off task details |
| Project context (tech stack, architecture decisions) | Outdated API version numbers |
| Recurring patterns (preferred error handling approach) | Sensitive data (passwords, keys) |
| Validated solutions (what worked) | Wrong inferences (should be corrected, not stored) |
MemoryProvider architecture. External providers (Honcho, Mem0, Hindsight) can be swapped in. Only one external provider active at a time. Hooks: on_session_end(), on_pre_compress(), on_delegation(), on_memory_write().
agent/memory_manager.py, agent/memory_provider.py09 Cross-Session Recall
The trick is not remembering everything. It is finding the right piece at the right time.
Cross-session recall connects the session database (Chapter 7) to the current conversation. When the agent needs to remember something from a past session, it searches the FTS5 index, retrieves the most relevant fragments, and summarizes them for injection into the current context.
class SessionRecall:
def __init__(self, session_db, llm_client, model):
self.db = session_db
self.client = llm_client
self.model = model
def recall(self, query: str, max_sessions: int = 3) -> str:
"""Search past sessions and return summarized context."""
# Step 1: FTS5 search
results = self.db.search(query, limit=30)
if not results:
return ""
# Step 2: Group by session, take top N unique sessions
seen_sessions = {}
for r in results:
sid = r["session_id"]
if sid not in seen_sessions:
seen_sessions[sid] = r
if len(seen_sessions) >= max_sessions:
break
# Step 3: For each session, load conversation around matches
summaries = []
for sid, meta in seen_sessions.items():
messages = self.db.get_session_messages(sid, limit=30)
transcript = self._format_transcript(messages)
# Step 4: Summarize via cheap/fast LLM
summary = self._summarize(query, transcript, meta["date"])
summaries.append(summary)
return "\n\n---\n\n".join(summaries)
def _summarize(self, topic, transcript, date) -> str:
resp = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content":
"Summarize this past conversation, focusing on information "
"relevant to the given topic. Be concise. 200 words max."},
{"role": "user", "content":
f"Topic: {topic}\nDate: {date}\n\nTRANSCRIPT:\n{transcript}"}
],
max_tokens=self.max_tokens, # default 300
)
return resp.choices[0].message.content
The four-step flow:
- FTS5 search. Find messages matching the query across all sessions.
- Group by session. Take the top 3 unique sessions (not individual messages).
- Load conversation context. For each session, pull surrounding messages to preserve conversational flow.
- Summarize via LLM. Use a cheap, fast model to condense each session's transcript into a focused summary. This is where the real token savings happen.
Making it a tool
Cross-session recall is exposed to the agent as a tool called session_search. The agent decides when to search its own history:
registry.register(
name="session_search",
description="Search past conversations for relevant context. "
"Use when you need to recall what was discussed previously.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search topic"},
},
"required": ["query"],
},
handler=recall.recall,
category="memory",
)
This is elegant. The agent is not force-fed history. It actively searches when it recognizes that past context would be useful. "That approach we discussed last week" triggers a search. "Write a hello world script" does not.
_summarize_session(). Session truncation around match locations (_truncate_around_matches(), 100K char limit). Groups by session with deduplication. Uses auxiliary LLM (cheap/fast, e.g. Gemini Flash) for summaries (up to 10K tokens per session).
tools/session_search_tool.py, hermes_state.py (search_messages())Skills That Grow
10 The Skill System
Skills are folders containing a SKILL.md file, supporting references, and templates. The agent discovers, loads, and manages them through three dedicated tools.
If session memory is "what happened" and persistent memory is "who you are," then skills are "how to do things." Each Skill is a directory under ~/.hermes/skills/ containing a SKILL.md file and optional supporting files.
Skill directory structure
This is the agentskills.io standard, supported by Claude Code, Cursor, Codex CLI, Gemini CLI, and others. Skills are portable across tools.
Anatomy of a SKILL.md file
---
name: git-commit-style
description: Enforce a consistent Git commit message format
version: "1.0.0"
platforms: [macos, linux] # Optional: restrict to OS
metadata:
hermes:
tags: [git, workflow]
requires_toolsets: [terminal]
---
# Git Commit Style
## Trigger
Activate when the user asks me to commit code, write a commit
message, or review commit history.
## Rules
### Commit Message Format
- First line: type(scope): summary (50 chars max)
- Blank line
- Body: explain WHY, not WHAT
## Example
feat(auth): add QR code login
Previously users could only log in with a phone number.
Now they scan a QR code and they're in.
Three tools, three tiers of progressive disclosure
Hermes uses a progressive disclosure pattern to keep token costs flat as the skill library grows. The agent sees skill names cheaply and loads full content only when needed:
| Tool | What it returns | Token cost |
|---|---|---|
skills_list | Name + description for all skills (max 64 + 1024 chars each) | Low: metadata only |
skill_view | Full SKILL.md body, or a specific file within the skill directory | Medium: one skill at a time |
skill_manage | Create / edit / patch / delete / write_file / remove_file | Varies by action |
# The three skill tools
def skills_list(category=None) -> str:
"""List all skills with metadata. Progressive disclosure tier 1."""
skills = []
for skill_md in skills_dir.rglob("SKILL.md"):
meta, _ = parse_frontmatter(skill_md.read_text())
if not meta or not skill_matches_platform(meta):
continue
skills.append({
"name": meta.get("name", skill_md.parent.name),
"description": meta.get("description", "")[:1024],
})
return json.dumps(skills, indent=2)
def skill_view(name: str, file_path: str = None) -> str:
"""Load full skill content. Progressive disclosure tier 2-3."""
skill_dir = find_skill(name)
if file_path:
# Tier 3: load a specific reference/template file
return (skill_dir / file_path).read_text()
# Tier 2: load the main SKILL.md
return (skill_dir / "SKILL.md").read_text()
def skill_manage(action, name, content=None, category=None,
file_path=None, file_content=None) -> str:
"""Create, edit, patch, or delete skills."""
if action == "create":
# Creates skill_dir/SKILL.md with validated frontmatter
skill_dir = skills_dir / (category or "") / name
skill_dir.mkdir(parents=True)
(skill_dir / "SKILL.md").write_text(content)
elif action == "patch":
# Find-and-replace within SKILL.md or a supporting file
# Uses exact string matching -- precise, token-efficient
...
elif action == "edit":
# Full rewrite of SKILL.md
...
elif action == "write_file":
# Add/overwrite a supporting file (references/, templates/, etc.)
...
elif action == "delete":
# Remove entire skill directory
...
skills_guard.py) on every agent-created skill. It checks for shell injection, credential exposure, and path traversal. The skill_manage tool validates that file paths stay within ALLOWED_SUBDIRS (references, templates, scripts, assets) and rejects path traversal attempts.
skills_list, skill_view, skill_manage. Basic YAML frontmatter parsing. Folder discovery via rglob("SKILL.md").
platforms: [macos]). Environment variable prerequisites with interactive secret collection. Atomic writes via tempfile + os.replace(). Security scanning on create. External skill directories. Skills Hub integration for community skills. 100K character size limit per SKILL.md.
tools/skills_tool.py (skills_list, skill_view), tools/skill_manager_tool.py (skill_manage), agent/skill_utils.py11 Autonomous Skill Creation
After finishing a complex task, the agent asks itself: "Would this solution be useful again?" If yes, it writes a Skill.
This is the mechanism that turns one-time problem solving into reusable knowledge. The agent does not silently create a Skill for every task. Per the skill_manage tool description, the agent offers to save as a skill and confirms with the user before creating or deleting. The background nudge system (Chapter 13) can also trigger skill review, but the confirmation flow is the designed behavior for interactive sessions.
When to create a Skill
The real Hermes triggers Skill creation when a task had:
- 5+ tool calls (non-trivial workflow)
- Error recovery (the agent hit an error and fixed it)
- User corrections (the agent learned from feedback)
- Non-obvious workflow (the solution was not straightforward)
The retrospective prompt
Skill creation is driven by a prompt, not by code heuristics. After the task completes, the agent receives a retrospective prompt that asks it to evaluate its own work:
class Retrospective:
PROMPT = """Review the conversation that just finished.
1. MEMORY: Are there facts worth remembering long-term?
- User preferences discovered
- Project context learned
- Solutions that worked (or didn't)
If yes, call the memory tool to save each observation.
2. SKILL CREATION: Was this task complex enough to warrant a reusable Skill?
Criteria: 5+ tool calls, error recovery, non-obvious workflow.
If yes, call skill_manage with action="create" and:
- A descriptive name
- Trigger conditions (when should this Skill activate)
- Step-by-step procedure
- Constraints and gotchas discovered
3. SKILL UPDATE: Was an existing Skill used? Did it work well?
If not, call skill_manage with action="patch" (old_string/new_string).
Be selective. Not every task deserves a Skill. Not every fact deserves
to be remembered. Only save what will genuinely help in the future."""
def run(self, agent, conversation_messages):
"""Run retrospective analysis on the completed conversation."""
# Build a summary of what just happened
summary = self._summarize_conversation(conversation_messages)
# Ask the agent to evaluate
agent.run(
f"[RETROSPECTIVE]\n\n"
f"Conversation summary:\n{summary}\n\n"
f"{self.PROMPT}"
)
How the agent creates a skill
The agent calls skill_manage with action="create". It provides the full SKILL.md content including frontmatter:
# The agent generates this tool call:
skill_manage(
action="create",
name="csv-to-database",
category="data", # optional subfolder
content="""---
name: csv-to-database
description: Clean CSV data and import into a database
version: "1.0.0"
---
# CSV to Database Import
## Trigger
Activate when the user asks to import CSV, clean data, or load
data into a database.
## Steps
1. Read the CSV and detect column types
2. Clean: strip whitespace, handle nulls, validate dates
3. Create table with appropriate column types
4. Bulk insert with error logging
## User Preferences
- Connection method: psycopg2 (user prefers this over SQLAlchemy)
- Always check if table exists first
"""
)
This creates ~/.hermes/skills/data/csv-to-database/SKILL.md. The skill is immediately available for the next conversation.
tools/skill_manager_tool.py (skill_manage function, actions: create/edit/patch/delete/write_file/remove_file)12 Skill Self-Improvement
Creating a Skill is step one. Updating it based on real-world feedback is what makes it "self-improving."
Traditional Skills are static: you write them, and they stay the same until someone manually edits them. Hermes Skills are alive. Every time a Skill is used and the user provides feedback, the agent can update the Skill file to incorporate what it learned.
The improvement cycle
The patch action
Hermes prefers patching over rewriting. The skill_manage tool's patch action does exact string find-and-replace within a SKILL.md or supporting file. This is important for two reasons: it preserves parts that work, and it uses far fewer tokens than a full rewrite.
# The agent generates this tool call:
skill_manage(
action="patch",
name="github-daily-digest",
old_string="3. Group by type (PR / Issue)",
new_string="3. Group by type (PR / Issue / Discussion)",
)
# Exact find-and-replace within SKILL.md.
# old_string must be unique unless replace_all=True.
# Include enough surrounding context to ensure uniqueness.
# Use file_path="references/api.md" to patch a supporting file.
# Use action="edit" for full rewrites when patches are too numerous.
A concrete example of how this works in practice:
- You ask Hermes to sort through GitHub notifications. It follows its "GitHub Daily Digest" Skill, returning only PRs and Issues.
- You say: "Include Discussions too." Hermes adds Discussions to this response.
- Hermes recognizes this is a correction to the Skill. It calls
skill_manage(action="patch", old_string="...", new_string="...")to update the Skill's steps. - Next time you say "check GitHub," the Skill already includes Discussions. You never have to mention it again.
Putting It All Together
13 The Learning Loop: Nudges, Background Review, and Compression Flush
The Learning Loop is not a single post-task retrospective. It is three distinct trigger mechanisms that fire at different times, all running in the background.
None of the components we have built are individually novel. Memory, skills, retrieval, user profiling: the AI field has seen all of these before. What Hermes does differently is wire them into a causal chain where each component feeds the next. But the "how" matters as much as the "what."
The book's earlier chapters presented a simplified model of a single retrospective that fires after each task. The real Hermes is more nuanced. Learning is triggered by three independent mechanisms:
| Trigger | When it fires | What it does |
|---|---|---|
| Memory nudge | Every N user turns (default: 10) | Background review of conversation for facts worth persisting |
| Skill nudge | Every N tool-calling iterations (default: 10) | Background review for reusable procedures to create or update |
| Compression flush | When context window hits 50% capacity | Extract memories before old messages are summarized and discarded |
Nudge counters
Hermes maintains two counters that tick up during normal operation:
class Agent:
def __init__(self):
# Nudge configuration (from config.yaml)
self._memory_nudge_interval = 5 # turns between memory reviews
self._skill_nudge_interval = 8 # tool iterations between skill reviews
self._turns_since_memory = 0
self._iters_since_skill = 0
def run_turn(self, user_input):
# Before the agent loop: check memory nudge
self._turns_since_memory += 1
should_review_memory = (
self._turns_since_memory >= self._memory_nudge_interval
)
if should_review_memory:
self._turns_since_memory = 0
# Run normal agent loop...
response = self._agent_loop(user_input)
# After the agent loop: check skill nudge
should_review_skills = (
self._iters_since_skill >= self._skill_nudge_interval
)
if should_review_skills:
self._iters_since_skill = 0
# Counters reset when the agent actually uses the tool
# (not just when the nudge fires)
# Spawn background review if either trigger fired
if should_review_memory or should_review_skills:
self._spawn_background_review(
review_memory=should_review_memory,
review_skills=should_review_skills,
)
return response
The counters also reset when the agent voluntarily uses the memory or skill_manage tool during normal conversation. If the agent is already saving memories on its own, the nudge does not need to fire.
Background review: a forked agent on a separate thread
When a nudge fires, Hermes does not inject a "please review your work" message into the user-facing conversation. Instead, it forks a new agent instance on a background thread with a snapshot of the conversation:
def _spawn_background_review(self, review_memory, review_skills):
import threading
# Pick the right prompt
if review_memory and review_skills:
prompt = COMBINED_REVIEW_PROMPT
elif review_memory:
prompt = MEMORY_REVIEW_PROMPT
else:
prompt = SKILL_REVIEW_PROMPT
messages_snapshot = list(self.messages)
def _run():
# Create a new agent with same model and tools
review_agent = Agent(model=self.model, max_iterations=6)
# Share the memory/skill stores (thread-safe writes)
review_agent.memory_store = self.memory_store
# Disable nudges on the review agent (no infinite recursion)
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Run with the conversation snapshot + review prompt
review_agent.run(messages_snapshot, prompt)
threading.Thread(target=_run, daemon=True).start()
This is a critical design choice: the review never competes with the user's task for model attention. The user sees their response immediately. The learning happens silently in the background.
Compression flush: the third trigger
When the context window hits 50% capacity, compression kicks in. But before old messages are summarized and discarded, Hermes gives the agent one final chance to save anything important:
def compress_context(self, messages):
# Step 1: Memory flush -- let the model save memories before they're lost
self.flush_memories(messages, min_turns=0)
# Step 2: Notify external memory providers
if self.memory_manager:
self.memory_manager.on_pre_compress(messages)
# Step 3: Now compress (summarize middle, keep head + tail)
compressed = self.compressor.compress(messages)
flush_memories() appends a user-role sentinel containing [System: The session is being compressed. Save anything worth remembering...]. It is a user message, not a system message, so it fits naturally into the conversation flow. The agent gets one API call with the memory tool available. After any saves, all flush artifacts (the sentinel and any tool calls) are stripped from the message list. The user never sees this exchange.
The on_pre_compress() hook notifies external memory providers (like Honcho) so they can also extract insights from the about-to-be-discarded messages.
Here is how the chain works:
- Memory curation feeds Skill creation. The observations stored in memory provide the raw material. The agent notices "I've done this CSV import three times now" because it can search past sessions.
- Skill usage generates new memories. Every time a Skill runs, the results (success, failure, user corrections) get recorded in session memory, triggering potential Skill improvements.
- Improved Skills produce better results. Better results mean the user is satisfied more often, which means fewer corrections, which means the agent's user model becomes more accurate.
- Better user modeling makes memory curation more targeted. The agent knows what this specific user cares about, so it saves observations that are genuinely relevant.
- More targeted memories feed better Skill creation. And the loop continues.
This is a positive feedback loop. The more you use it, the stronger every step gets. Use it for three to five days, and you will notice a clear difference.
The review prompts
The background review agent receives one of three prompts depending on which triggers fired:
When both triggers fire simultaneously, a combined prompt covers both. The review agent has a low iteration budget (max 8) to prevent runaway costs.
run_conversation() calls in CLI mode. External memory providers get on_session_end(), on_pre_compress(), and on_delegation() hooks. Review agent stdout/stderr redirected to /dev/null. Memory flush injects a system message, executes one API call, then strips all artifacts.
run_agent.py lines 2034-2115 (_spawn_background_review), lines 6390-6420 (flush_memories), lines 7629-7638 (nudge logic), lines 10218-10246 (post-turn trigger)14 Context Compression
Long conversations blow up the context window. Compression keeps the agent running without hitting token limits.
LLMs have a fixed context window (128K tokens for many current models, but costs scale linearly). A long coding session can easily hit 50K+ tokens. Without compression, the agent either crashes or becomes very expensive.
Hermes implements a middle-out compression strategy:
- Protect the head. The first 3 messages (system prompt + initial user message + first assistant response) are never compressed. They contain the identity and task context.
- Protect the tail. The most recent ~20K tokens are kept intact. This is the active working context.
- Compress the middle. Everything between head and tail is summarized by a cheap, fast LLM into a structured summary.
class ContextCompressor:
THRESHOLD = 0.5 # Trigger at 50% of context window
TAIL_TOKENS = 20000
HEAD_MESSAGES = 3
def maybe_compress(self, messages, max_tokens):
estimated = self._estimate_tokens(messages)
if estimated < max_tokens * self.THRESHOLD:
return messages # No compression needed
head = messages[:self.HEAD_MESSAGES]
tail = self._get_tail(messages, self.TAIL_TOKENS)
middle = messages[self.HEAD_MESSAGES:-len(tail)]
if not middle:
return messages
# Summarize the middle section
summary = self._summarize_middle(middle)
return head + [{
"role": "system",
"content": f"[Compressed context summary]\n{summary}"
}] + tail
def _summarize_middle(self, messages) -> str:
transcript = "\n".join(
f"{m['role']}: {m.get('content', '[tool call]')[:200]}"
for m in messages
)
resp = self.aux_client.chat.completions.create(
model=self.aux_model,
messages=[{
"role": "system",
"content": "Summarize this conversation segment. Include:\n"
"- Questions that were resolved\n"
"- Decisions that were made\n"
"- Pending work items\n"
"- Key facts discovered\n"
"Be concise. Under 500 words."
}, {
"role": "user",
"content": transcript
}],
max_tokens=800,
)
return resp.choices[0].message.content
Before compression happens, two things fire (as described in Chapter 13): flush_memories() gives the agent one API call to save important observations, and on_pre_compress() notifies external memory providers. This way, facts are not lost to summarization.
flush_memories() before compression (one extra API call to save facts). on_pre_compress() hook for external memory providers. Iterative re-compression: if the summary from a previous compression is in the middle, it gets re-summarized. Structured summary template (Resolved/Pending/Remaining Work). System prompt invalidation + memory reload after compression.
agent/context_compressor.py, run_agent.py lines 6564-6579 (compression trigger with memory flush)15 Building the CLI and Gateway
The interface. Where the user meets the agent.
The CLI
The minimal CLI is a read-eval-print loop that initializes the agent and feeds it user input:
import yaml
from openai import OpenAI
from pathlib import Path
def main():
# Load config
config_path = Path.home() / ".mini-hermes" / "config.yaml"
config = yaml.safe_load(config_path.read_text())
# Initialize components
client = OpenAI(
api_key=config["model"]["api_key"],
base_url=config["model"].get("base_url"),
)
data_dir = Path.home() / ".mini-hermes"
session_db = SessionDB(data_dir / "state.db")
persistent = PersistentMemory(data_dir)
skill_loader = SkillLoader(data_dir / "skills")
# Build system prompt ONCE (frozen snapshot for the session)
builder = PromptBuilder()
system_prompt = builder.build(
memory_block=persistent.load(), # frozen at session start
skills_index=skill_loader.build_skills_index(),
user_context="",
)
# Create session
session_id = session_db.create_session()
# Create agent with frozen system prompt
agent = HermesAgent(
client=client,
model=config["model"]["model"],
system_prompt=system_prompt, # not rebuilt per turn
tools=registry.get_schemas(),
tool_handlers={t.name: t.handler for t in registry._tools.values()},
)
agent.session_db = session_db
agent.session_id = session_id
# REPL
print("Mini-Hermes ready. Type 'exit' to quit.\n")
while True:
user_input = input("you > ").strip()
if user_input.lower() in ("exit", "quit"):
break
if not user_input:
continue
response = agent.run_with_learning(user_input)
print(f"\nhermes > {response}\n")
if __name__ == "__main__":
main()
Extending to Telegram
Hermes supports 14 platforms through a single Messaging Gateway: one process that listens to all configured platforms simultaneously. The key design decision is that all platforms share the same memory database, the same skills directory, and the same session store.
However, the gateway does not keep a single agent instance alive. It creates a fresh agent object per message, loading the stored system prompt from the session DB so the Anthropic prefix cache still hits. This is an important distinction: session continuity comes from the database, not from a long-lived object in memory.
from telegram.ext import Application, MessageHandler, filters
async def handle_message(update, context):
user_text = update.message.text
# Create a fresh agent per message (like real Hermes gateway)
# The session DB provides continuity, not a long-lived object
session_id = get_or_create_session(update.effective_user.id)
agent = HermesAgent(
client=client,
model=config["model"]["model"],
system_prompt=load_system_prompt_from_session(session_id),
tools=registry.get_schemas(),
tool_handlers={t.name: t.handler for t in registry._tools.values()},
)
agent.session_db = session_db
agent.session_id = session_id
response = agent.run_with_learning(user_text)
await update.message.reply_text(response)
app = Application.builder().token(config["gateway"]["telegram"]["token"]).build()
app.add_handler(MessageHandler(filters.TEXT, handle_message))
app.run_polling()
Deploy this on a $5 VPS and you have a 24/7 AI assistant reachable from your phone, with persistent memory across every conversation.
hermes gateway) with platform adapters for 14 services. Session routing tied to user IDs, not platforms. Cron ticking for scheduled tasks. Cross-platform conversation continuity. Fresh AIAgent per message with stored system prompt from session DB for cache consistency.
gateway/run.py, run_agent.py lines 7650-7677 (system prompt caching on continuation)16 What You Built and Where to Go Next
A review of the complete system, and pointers to everything we left out.
What you have
By following this guide, you have built a minimal but complete self-improving AI agent. Here is the component map:
| Component | What it does | Hermes equivalent |
|---|---|---|
| Agent Loop | Message -> tools -> response cycle | run_agent.py |
| Tool Registry | Register, schema, dispatch tools | tools/registry.py |
| Prompt Builder | Assemble identity + memory + skills | agent/prompt_builder.py |
| Prompt Caching | system_and_3 cache breakpoints | agent/prompt_caching.py |
| Session DB | SQLite + FTS5 conversation storage | hermes_state.py |
| Persistent Memory | MEMORY.md / USER.md | Built-in memory provider |
| Session Recall | FTS5 search + LLM summarization | tools/session_search_tool.py |
| Skill Tools | skills_list / skill_view / skill_manage | tools/skills_tool.py, tools/skill_manager_tool.py |
| Learning Loop | Nudge counters + background review + compression flush | run_agent.py (_spawn_background_review) |
| Compression | Middle-out context window management | agent/context_compressor.py |
| CLI | Terminal interface | cli.py |
What we left out
The real Hermes has substantially more. Here is what to explore next, in order of impact:
Sub-agent delegation
Hermes can spawn up to 3 concurrent sub-agents, each with its own context and restricted toolset. Useful for parallel research (one agent per topic) or separating concerns (one codes, one tests, one reviews). The key insight: sub-agents get restricted tool access for both efficiency and security.
MCP integration
Model Context Protocol lets the agent connect to 6,000+ external applications (GitHub, Slack, Jira, databases) via a standard protocol. Each MCP server is a separate process, communicating over stdio or HTTP.
Honcho user modeling
An optional external integration that goes beyond what you said to infer what kind of person you are. It tracks 12 identity layers including technical level, work rhythm, communication style, and preference contradictions. The inferences are injected as invisible context.
Cron scheduling
Natural-language scheduled tasks. "Check my GitHub notifications every morning at 9am" creates a timed trigger. Results are delivered through the Messaging Gateway.
Multi-model orchestration
The moa tool calls multiple LLMs simultaneously and synthesizes their responses. Useful for high-stakes decisions where you want diverse perspectives.
Reinforcement learning
Hermes has experimental RL support for fine-tuning the agent's decision-making. This is a research frontier, not yet stable.
The ceiling of self-improvement
Self-improvement makes the agent run faster in a known direction. But the direction itself still needs a human to set. The agent can optimize its git commit format, but it cannot judge whether the architecture of your system is sound. It can learn your preferences, but it can learn wrong ones too.
The Learning Loop relies on feedback quality. When you provide clear corrections ("add Discussions to the GitHub digest"), the system works beautifully. When you say nothing, the agent evaluates itself using its own criteria. "Faster" does not always mean "correct."
As you extend your minimal agent, remember: the mechanisms work. The harder problem is pointing them in the right direction.