From OpenClaw to Hermes Agent

This is the continuation of the OpenClaw Training series. The first five posts were built on OpenClaw. This one documents what happened after migrating to Hermes Agent — how to reimplement the same customization approach on the new foundation, and where things got better.


Why Switch Platforms

By the time the OpenClaw series was finished, that setup had been running in daily use for nearly two months. There were no major functional gaps, but three friction points never went away:

Telegram integration was bolted on. Message send/receive relied on an extra bot script, not the same process as the agent’s main loop. Whenever a notification needed to go out, it was either the agent calling curl or a script polling. State sync was painful; debugging was worse.

Cron scheduling lived outside the shell. System crontab called shell scripts, which triggered the OpenClaw CLI, and the LLM only got involved at the very last step. The entire scheduling chain was dumb — the LLM couldn’t participate in decisions like “should this run?” or “how should we handle the result?”

Heavy tuning dependency. OpenClaw’s configuration assumed Claude-class high-performance models underneath. Swapping in smaller models produced noticeably worse results — meaning the cost of running this setup was permanently locked to top-tier models.

Hermes Agent built all three into the core: Telegram is a first-class communication channel, cron is a built-in LLM-driven scheduler, and there’s a self-evolving skill system. The decision to switch was straightforward.

Another driver was model strategy flexibility. Hermes lets you assign different models by task type: 90% of daily tasks — status reporting, task scheduling, memory writes, newsletter archiving — work perfectly well with MiniMax M2.7, which is fast and cheap; Claude Sonnet is reserved for tasks that need deep reasoning (complex architectural decisions, long-form analysis). This is fundamentally different from the OpenClaw era’s “every task goes through the same expensive model” approach.

This post is not a Hermes getting-started guide. It’s a migration log built on top of existing OpenClaw customization experience — the migration process, the design thinking, and the concrete implementation differences between the two systems.


1. Basic Configuration: Giving Hermes a Soul

1.1 SOUL.md and AGENTS.md: Two Entry Points, Different Roles

Hermes’s configuration entry point is the ~/.hermes/ directory. Two files are automatically injected into the system prompt on every session start, each with a distinct responsibility:

  • SOUL.md: The identity layer. Defines “who this AI is” — name, core behavioral principles, security hard rules. This evolves slowly and forms the values foundation of the entire system.
  • workspace/AGENTS.md: The operational entry point. Defines “what to do first on every session” — the startup checklist, workflow routing table, memory write rules, and tool constraints. This is the execution specification for how the agent concretely operates.

Both are injected, but with different focus: SOUL.md is “who am I” written for the AI; AGENTS.md is “how do I handle this task” written for the AI. In the OpenClaw era, both types of information were mixed across IDENTITY.md and AGENTS.md with fuzzy boundaries; Hermes splits them explicitly so each can evolve independently.

1.2 SOUL.md: From Identity Description to Behavioral Constitution

The difference from OpenClaw: OpenClaw split identity description (IDENTITY.md) and behavioral principles (SOUL.md) into two files; Hermes merges them, and SOUL.md also carries several hard security rules — in OpenClaw, that was handled by the permission layer in AGENTS.md.

# SOUL.md

## Name
**[Give your AI a name]**

## Core Truths

**Be genuinely helpful, not performatively helpful.**
Skip the "Great question!" — just help.

**Act immediately — but know the difference between execution and architecture.**
- Execution tasks (write this file, fix this bug): have context → act.
- Architecture decisions: align direction with the user first, then act.

**Have opinions.** You're allowed to disagree.

## 🚨 Security Hard Rules

**Public Repo Push = Blocked.**
- Pushing to any public repo requires explicit user approval. No exceptions.
- If you're unsure whether a repo is public, ask before pushing.

**Vault Backup Protocol**
- The vault is the sole backup repo for the AI's complete state.
- Excluded: hermes-agent/ (separate git), cache/ (runtime cache).
- Everything else is backed up here.

Putting security rules in SOUL.md rather than config files is deliberate: when the LLM reads them in natural language context, they’re harder to treat as ignorable background noise than JSON flags.

The above sample is illustrative. Actual configurations contain more specific identity descriptions and security rules; personal details have been anonymized for publication.

Hermes natively supports placing CLAUDE.md in project directories, automatically injected when executing project-specific tasks — the same functionality as OpenClaw’s AGENTS.md, just a different filename. For cross-tool compatibility, you can add @AGENTS.md inside CLAUDE.md to import it, keeping migration cost minimal.

1.2 Dual-Track Memory Replaces Static USER.md

OpenClaw’s USER.md was a static file recording the user’s background, preferences, and communication style. Manually maintained; every update meant opening an editor.

Hermes splits this into two real-time memory tracks, written via the built-in memory tool and injected on every session:

user   ← Who the user is: name, role, preferences, communication style, decision habits
memory ← What the environment is: tool quirks, project conventions, API details, workflow experience

The distinction is purpose: user is about “who this person is”; memory is about “what this environment is like.”

A concrete example: the user says “I don’t like it when you add a bunch of preamble every time” — that goes in user, because it’s about this person’s preference. But “Things3 delete command requires the UUID twice” — that goes in memory, because it’s a tool quirk unrelated to who the user is.

Writing doesn’t require manual triggering; the agent automatically writes when it discovers something worth recording during conversation. The user can also say “remember this” — the agent decides whether it belongs in user or memory.

The discipline for writing is stricter than in the OpenClaw era:

# Good memory entry (slowly changing facts)
User's style for architecture decisions: align direction first, then implement; doesn't accept AI making unilateral judgment calls and changing things directly.

# Bad memory entry (will be stale within a week)
Research completed (2026-05-23), divergence=0.181, PR #47 merged.

PR numbers, commit SHAs, “fixed bug X” — none of this goes into memory. It becomes meaningless quickly; to recover historical context, session_search from session history is more appropriate than storing it in memory.

1.3 config.yaml: Permission Model

OpenClaw’s tool permissions were managed through textual descriptions in AGENTS.md — essentially relying on the LLM to self-enforce. Hermes tightens this at the config layer, controlling available toolsets via the toolsets field in config.yaml:

# ~/.hermes/config.yaml (excerpt)
# Default model: handles 90% of daily tasks
model: minimax/MiniMax-M2.7
provider: minimax-cn

# Default toolsets per channel
toolsets:
  - terminal
  - file
  - web
  - browser
  - cronjob

channels:
  telegram:
    enabled: true
    home_chat_id: "YOUR_CHAT_ID"

For crons that need deeper reasoning, you can specify a model at creation time:

# Daily reflection uses a stronger model; routine archiving uses the default
hermes cron create \
  --name "daily-reflection" \
  --schedule "45 23 * * *" \
  --model "anthropic/claude-sonnet-4-5" \
  --prompt "..."

2. Memory System: Hermes Implementation of the Three-Layer Architecture

2.1 Why Built-in Memory Isn’t Enough

Hermes’s built-in memory tool is a flat key-value structure with capacity limits (memory track ~2200 characters, user track ~1375 characters). For cross-session daily event logs, lessons learned, and decision records, stuffing everything in fills up quickly; and all content is fully injected on every session with no dynamic filtering on quality or quantity.

Following the design from the OpenClaw era, a three-layer SQLite structure was built in memory.db:

memory.db
├── daily_log      ← Daily event log (append-only, raw records)
├── lessons        ← Lessons learned (updatable)
├── decisions      ← Important decision records
├── people         ← People / organization relationships
└── archive        ← Archived content after 45 days

FTS5 full-text indexing covers all tables, supporting cross-layer keyword search.

2.2 Layer Zero: The now Table — Real-Time State Awareness

Beyond the memory system, there’s a dedicated mechanism solving a specific problem: how does the agent know “where things stand right now” at the start of each session? The answer is the now table in memory.db.

The user and memory dual-track memory answers stable background questions — “who is this person,” “what is this environment like.” But they can’t tell the agent: a PR from yesterday is still unmerged, today’s top priority is X, a half-finished discussion from last session needs follow-up. These are real-time state — fast-changing, not suitable for char-limited memory tracks.

The now table has just three keys:

recent_events  ← Key events from today (one conclusion sentence, not details)
pending        ← Current unfinished tasks
today_focus    ← The day's core objective

The design intent: at the start of each session, the agent’s first action is reading the now table to immediately understand “current status” — without the user needing to re-explain context. Updated in real time when tasks complete, not waiting for daily batch processing.

Compared to OpenClaw’s NOW.md, this isn’t just a format change from Markdown to SQL. NOW.md was a human-readable summary, manually maintained; the now table is structured state for machine real-time querying, written automatically by the agent during task execution and pushed to the user automatically via a hook on session start. Completely different audience and purpose.

2.3 Five-Layer Memory System

graph TD
    L0["Layer 0: now table (SQL)<br/>Real-time state: recent_events / pending / today_focus<br/>Purpose: read immediately at session start to understand current status<br/>Trait: real-time updates, no waiting for batch"]
    L1["Layer 1: Hermes built-in memory<br/>Injection limits: memory ~2200 chars, user ~1375 chars<br/>Purpose: most-used facts, preferences, conventions<br/>Trait: fully injected every session"]
    L2["Layer 2: memory.db FTS5<br/>Structured retrieval, no limit<br/>Purpose: log streams, lessons, decision records<br/>Trait: active query, 90-day archive mechanism"]
    L3["Layer 3: memory.db archive<br/>Cold archive<br/>Purpose: entries auto-migrated after 90 days<br/>Trait: not actively loaded, retrieved on demand"]
    L4["Layer 4: session_search<br/>Session history, coldest<br/>Purpose: things discussed before but not saved<br/>Trait: everything, FTS5 full-text search"]

    L0 -->|"events written to daily_log"| L2
    L1 -->|"90-day decay"| L2
    L2 -->|"auto-archive when expired"| L3
    L4 -.->|"safety net layer"| L0

An example spanning all five layers: the user asks “what was that API rate limiting approach we discussed last time?”

  • Agent first checks Layer 0 now table’s recent_events — nothing, that was last week
  • Then Layer 1 built-in memory — nothing, too detailed
  • Then Layer 2 memory.db, search “API rate limit” — if the agent recorded a lesson at the time, it’s here
  • If not found, go to Layer 4 session_search full-text search — the session transcript definitely has it, even if it wasn’t explicitly archived

2.4 memlog.py: Write Protocol

The write entry point is ~/.hermes/workspace/scripts/memlog.py, called by the agent during task execution — users don’t need to run it directly. The interface is simple:

# Internal agent call format: title + optional content
python3 ~/.hermes/workspace/scripts/memlog.py "title" "detailed content"

# Title only
python3 ~/.hermes/workspace/scripts/memlog.py "newsletter FSM SOP rebuild complete"

# Specify date (backfill historical entries)
python3 ~/.hermes/workspace/scripts/memlog.py --date 2026-05-20 "title" "content"

Each record writes to the daily_log table with timestamp and source marker. The agent calls it automatically when completing tasks, hitting pitfalls, or making decisions — no manual triggering needed.

Compared to OpenClaw’s memlog.sh, switching from Markdown files to SQLite’s main benefit is queryability — when the agent needs “how was the Linear webhook configured last time?”, one FTS5 query locates it instead of scrolling through an entire file.

2.5 session_search: Use Cases for the Fourth Cold-Memory Layer

Hermes stores every session in a local session DB. Typical usage in practice:

When memory doesn't have a fact, but you vaguely remember "we discussed this before"
→ session_search("linear webhook config")
→ pull context from historical sessions
→ instead of asking the user again

This actually solves a pain point from the OpenClaw era: if a detail wasn’t written to MEMORY.md in time, it was completely lost across sessions. Now session history itself is a safety net.


3. Workflow System: How My SOP System Runs on Hermes

Workflow is not a built-in Hermes concept. It’s a private SOP system I built myself, existing since the OpenClaw era and fully preserved after migrating to Hermes. What Hermes provides is skills — general, reusable tool knowledge; workflows are my personal best practices built on top of skills. They’re different layers.

3.1 The Boundary Between Skill and Workflow

These two concepts are easily confused because both are documents that “tell the agent how to do something.” The difference is audience and portability:

Test for whether something belongs in skill or workflow:
"Could you hand this document to someone who doesn't know me at all, and have them execute it?"

  Yes → Skill (general tool knowledge, can be published and shared with anyone)
  No  → Workflow (private SOP, contains personal conventions and context)

Specifically:

  • Skill: AKShare API usage, Linear GraphQL query syntax, how to create a PR with gh CLI — these are independent of “who’s using it”, goes in skill
  • Workflow: newsletter processing priority rules, who to notify during project review, when to push Telegram alerts — these are personal conventions, goes in workflow

When the boundary is fuzzy? A practical example: Gmail search syntax is a skill (anyone can use it), but “only emails with ‘Weekly’ in the title from ADOPTED senders need same-day processing” is a workflow (my own rule). Same tool — general usage goes in skill, personal conventions go in workflow.

Skills live in ~/.hermes/skills/, managed and loaded by Hermes. Workflows live in ~/.hermes/workspace/workflow/, entirely maintained by me — Hermes doesn’t know or care about this directory; they’re just Markdown files the agent reads when executing tasks.

3.2 Formal Skill Format

Hermes skills have a fixed file format, much more structured than skill descriptions embedded in AGENTS.md:

---
name: email-monitoring
description: Gmail email checking and FSM state management — general email classification tool based on email_check.py
platforms: [macos]
required_commands: [python3]
required_environment_variables: [GMAIL_CREDENTIALS_PATH]
---

# Email Monitoring

## Trigger Conditions

Load this skill when the user says "check email", "any new emails", or "process newsletter queue".

## Steps

1. Run `python3 ~/.hermes/scripts/email_check.py --mode check`
2. Read `~/.hermes/workspace/status/MAILLIST.json` for sender states
3. Process by queue priority (ADOPTED senders first, NEW senders next, MUTED skipped)
4. After each item, call `memlog.py --type daily_log` to record

## Pitfalls

- MAILLIST.json path is fixed at `~/.hermes/workspace/status/MAILLIST.json`, not under `.cache/`
- Newsletter emails go through FSM state machine, not immediate processing
- email_check.py requires Gmail OAuth token; first run triggers browser authorization

The required_commands and required_environment_variables fields are automatically checked when the skill loads — missing dependencies are flagged upfront, not at execution time.

3.3 Case Study: Complete Newsletter Processing Migration

In the OpenClaw era, newsletter processing was a simple workflow playbook: receive email → parse → save to Obsidian → log. Triggered manually.

After migrating to Hermes, two substantive upgrades:

Upgrade 1: email_check.py adds FSM

Senders are no longer a binary “is it a newsletter?” judgment — they have a full state machine:

stateDiagram-v2
    [*] --> NEW : First received
    NEW --> LEARNING : Added to watch list
    LEARNING --> EVALUATING : Multiple consecutive emails, novelty baseline established
    EVALUATING --> EVALUATING : Each new email triggers Jaccard evaluation
    EVALUATING --> ADOPTED : High novelty (consistently brings new information)
    EVALUATING --> MUTED : Low novelty (highly repetitive content)
    ADOPTED --> EVALUATING : Periodic re-evaluation

MAILLIST.json records each sender’s current state:

{
  "senders": {
    "[email protected]": {
      "state": "ADOPTED",
      "display_name": "Example Weekly",
      "novelty_scores": [0.72, 0.68, 0.81],
      "last_seen": "2026-05-20",
      "total_received": 14
    },
    "[email protected]": {
      "state": "MUTED",
      "muted_reason": "novelty_avg < 0.3",
      "muted_at": "2026-04-15"
    }
  }
}

This corresponds to a systemic principle: any persistent data stream should have a feedback loop. Newsletter archiving without a judgment mechanism just piles up files with poor learning outcomes. The FSM gives the system a concept of “what it has learned.”

Upgrade 2: From manual trigger to cron scheduling

Detailed in the next chapter. For now: after migration, newsletter processing no longer needs manual triggering — it runs automatically every morning, pushing notifications only when there’s a backlog.


3.4 From Repeated Operations to SOP Extraction

After using the system for a while, a natural signal appears: the same set of tools is always called in a fixed order. That’s the precursor to a workflow.

Criteria for when an operation sequence should be solidified into a workflow:

Same operation sequence appears ≥ 3 times
  + Context is highly similar each time
  + Intermediate judgment logic can be explicitly described
  ─────────────────────────────
  → Time to write a workflow playbook

You can ask the agent directly in a session: “Is there anything I’ve been doing repeatedly that could be solidified into a workflow?” — the agent scans session history and suggests candidates.

The extraction process is essentially making tacit knowledge explicit. Decisions that relied on the agent’s on-the-spot judgment during ad-hoc execution need to be written out explicitly in the workflow:

  • Trigger conditions: When should this flow run? When should it not?
  • Pre-checks: What conditions to confirm before starting (file exists? network reachable? how long since last run?)
  • Permission boundaries: Which steps can auto-execute, which must wait for user confirmation?
  • Failure handling: If a step fails, what happens to the whole flow? Retry? Stop with error? Skip and continue?

Once written into a playbook, every execution follows the same logic — predictable, auditable, modifiable behavior. Skills also play a role in this loop: if a tool’s usage description is wrong, patch the skill directly; if a workflow step’s tool parameters changed, update the skill and the workflow naturally stays correct. The two evolve independently.

The OpenClaw series described SOPs as “run into existence” — get it running first, refine while running. This principle applies fully in Hermes, and separating skills from workflows makes the loop shorter: changing a skill doesn’t touch workflows, changing a workflow doesn’t touch skills — no interference.

3.5 Defining a Good Workflow: An Engineering Cybernetics Perspective

The core structure of Qian Xuesen’s engineering cybernetics is: sensor (capture current state) → comparator (compare against target) → actuator (issue corrective action) → feedback loop.

Good workflow design is essentially writing this structure explicitly into the flow definition. A workflow without feedback loops is just a task checklist; a workflow with feedback loops is a learning system.

Using the project management workflow as an example, showing how to implement this structure:

graph LR
    subgraph Strategy_Biweekly["Strategy Layer - Biweekly"]
        S1["sprint goals / product backlog"]
        S2{"this delivery vs target"}
        S3["Retrospective / Sprint Planning"]
        S1 --> S2 --> S3 --> S1
    end
    subgraph Tactical_Daily["Tactical Layer - Daily"]
        T1["PR status CI results / Linear issue"]
        T2{"check against DoD acceptance criteria"}
        T3["Code Review / blocker escalation"]
        T1 --> T2 --> T3 --> T1
    end
    subgraph Operational_Hourly["Operational Layer - Hourly"]
        R1["heartbeat / commit frequency"]
        R2{"timeout or stuck?"}
        R3["Telegram push / status update"]
        R1 --> R2 --> R3 --> R1
    end
    Operational_Hourly -->|"anomaly report"| Tactical_Daily
    Tactical_Daily -->|"sprint data"| Strategy_Biweekly

Each layer has its own feedback loop, and lower-layer output is upper-layer sensor input — that’s “hierarchical control.”

Mapped to workflow playbook writing, each layer’s feedback loop needs explicit definition:

---
name: project-management
trigger: ["project progress", "check on", "sprint"]
status: active
---

# Project Management Workflow

## Operational Layer (by cron, hourly)

### Sensor
- Read PROJECTS.json for current project states
- Query Linear: any In Progress issues not updated in 3+ days
- Query GitHub: any PRs waiting for review 24+ hours

### Comparator (judgment conditions)
- Issue stalled > 3 days → trigger blocker alert
- PR waiting review > 24h → trigger review alert
- CI failed > 2 consecutive times → trigger failure alert

### Actuator
- Trigger hit → Telegram push with specific content + suggested action
- No trigger → silent (hermes-heartbeat-state.json only updates timestamp)

## Tactical Layer (by session, daily)

### Sensor
- Read today's daily_log for progress
- Pull PR diffs, check each against acceptance criteria

### Comparator
- Check against Linear issue Acceptance Criteria item by item
- Not met → list specific gaps

### Actuator (requires human confirmation)
- Provide Code Review feedback
- Update Linear issue status
- ⚠️ No auto-merge, no auto-close issue

## Strategy Layer (by session, biweekly)

### Sensor
- Summarize all issue completions this sprint
- Extract blocker and decision records from daily_log

### Comparator
- Actual completion vs sprint goal
- Velocity trend (this sprint vs last)

### Actuator (human-led)
- Agent generates retrospective draft
- ⚠️ Direction changes, priority reordering → human decides, agent executes

Where “human in the loop” sits is a key design decision. Not every step needs confirmation, and not every step can be automatic — get it wrong and either the flow stalls or problems go unnoticed. A simple principle: irreversible operations (publish, merge, delete, external notifications) require human confirmation; read-only and draft operations can be fully automated.

3.6 Research Workflow: Automation and Evaluation

Research is another workflow type suited for deep automation, but unlike project management — its core challenge isn’t “what was done” but “how much was learned”.

Evaluation: divergence score

A good research session should answer: how much new information did this research bring me?

Jaccard similarity measures this: for each new document / search result, compute its vocabulary overlap with existing notes. Less overlap with existing content means higher divergence.

divergence = 1 - |new content word set ∩ existing notes word set| / |new content word set ∪ existing notes word set|

divergence near 1.0 → this source brought lots of new information, dig deeper
divergence near 0.0 → highly repetitive content, this direction is saturated, switch or wrap up

Practical threshold reference:

divergence > 0.4  → keep searching, coverage not enough yet
0.2 ~ 0.4         → main conclusions converging, can start writing synthesis
< 0.2             → research saturated, stop, write conclusions

When divergence stays below threshold for two consecutive rounds, the workflow automatically marks the research as “ready to conclude” and pushes a notification for user confirmation. More reliable than gut-feeling “that’s probably enough.”

Research workflow state machine

stateDiagram-v2
    [*] --> NEW : User specifies research topic
    NEW --> SCOPING : Start
    SCOPING --> SCOPING : Auto pre-search, generate outline draft
    SCOPING --> RESEARCHING : Human confirms outline direction
    RESEARCHING --> RESEARCHING : divergence > threshold, keep searching
    RESEARCHING --> EVALUATING : divergence < threshold, converging
    EVALUATING --> CONCLUDED : Cross-validation passed, write conclusions
    CONCLUDED --> ARCHIVED : Auto-archive after 45 days
    ARCHIVED --> [*]

Structured research note output

For divergence calculation to be meaningful, each research output needs structure. _index.md is the global tracking table; index.md in each topic directory is the entry file:

---
research_goal: "Understand XXX core mechanisms and engineering practice"
divergence_history: [0.71, 0.52, 0.38, 0.19]
status: concluded
concluded_at: 2026-05-20
---

## Core Conclusions
(3-5 sentences of judgment, not a summary)

## Confidence Assessment
- High confidence: conclusions from 3+ independent sources
- Medium confidence: from 1-2 sources, pending cross-validation
- Low confidence: single source, disputed

## File Map
| File | Content | Status |

divergence_history records scores per round, making the convergence curve visible — if the first two rounds are already near 0, the research scope is too narrow; if still high after five rounds, the domain itself is diffuse and needs active scope narrowing.

Automation with cron

Research workflow can also be partially automated:

flowchart TD
    A(["User starts research"]) --> B["agent creates directory structure + generates outline"]
    B --> C{"User confirms outline?"}
    C -->|"No, adjust direction"| B
    C -->|"Yes"| D["Daily cron, LLM-driven"]
    D --> E{"Read current divergence"}
    E -->|"above threshold"| F["Auto-run one search round, append notes, recalculate"]
    F --> D
    E -->|"below threshold"| G["Push research nearing conclusion notification"]
    G --> H{"User confirms end?"}
    H -->|"Continue"| D
    H -->|"End"| I["agent writes conclusions"]
    I --> J["Update index.md"]
    J --> K(["Archive"])

The key design: “whether to keep searching” is automatic, but “whether conclusions are correct” is human. Machines are good at quantifying coverage; not good at judging conclusion quality.


4. Status Layer: Separating Runtime State

The previous three chapters covered skills, workflows, and cron — each involves a type of data: skills need to know tool paths and auth methods, workflows need to read current project state, cron needs to judge last execution results to decide whether to push notifications this time. Where does this data live?

Can’t go in memory — memory has decay, capacity limits, designed to “allow staleness.” Can’t go in session context — cron starts a fresh session each time with no context inheritance. The answer is the status/ directory: a dedicated place for real-time runtime state, no decay, scripts write, agent reads.

4.1 Convergence Point for Three System Data Flows

status/ isn’t a vague “temp folder” — each file corresponds to a specific system from earlier:

graph TD
    subgraph status_directory["status directory"]
        M[MAILLIST.json]
        P[PROJECTS.json]
        H[hermes-heartbeat-state.json]
        C[candidates.json]
    end

    NW["newsletter workflow + email_check.py FSM"] -->|"write sender state"| M
    M -->|"read, decide which senders to process"| NW
    M -->|"read, filter email queue"| CR1["newsletter cron"]

    PW["project workflow, operational layer cron"] -->|"write project snapshot"| P
    P -->|"read overview, skip re-querying Linear"| PS["session tactical layer"]
    P -->|"compare diff, push only on change"| CR2["heartbeat cron"]

    CR1 -->|"write timestamp after execution"| H
    CR2 -->|"write snapshot after execution"| H
    H -->|"read last result"| CR1
    H -->|"read last result"| CR2

    RW["research workflow"] -->|"write candidate topics + divergence progress"| C
    C -->|"read, judge which are nearing conclusion"| CR3["daily-reflection cron"]

4.2 Read/Write Relationships for Each File

MAILLIST.json

Written by email_check.py, records each newsletter sender’s FSM state (NEW / LEARNING / EVALUATING / ADOPTED / MUTED) and novelty_scores history.

The newsletter workflow’s operational layer cron reads it to decide which senders’ emails to process and which to silently skip. The agent also reads it in session when processing emails to confirm current sender state. Only email_check.py writes; the agent doesn’t directly modify it — unless the user explicitly wants to adjust a sender’s state.

PROJECTS.json

Written by the project management workflow’s operational layer cron, records each project’s current state snapshot:

{
  "projects": {
    "my-project": {
      "linear_status": "in_progress",
      "open_prs": 2,
      "blocked_issues": 1,
      "last_updated": "2026-05-26T08:00:00+08:00"
    }
  }
}

The project management workflow’s tactical layer (daily session) reads it for a quick project overview without re-querying the Linear API every time. Cron comparator logic also depends on it: compare newly fetched data against the last snapshot — push only on change, silent otherwise.

hermes-heartbeat-state.json

Written by each cron job after execution, records last run time and key results. Read on next execution to decide whether to repeat operations:

{
  "last_run": "2026-05-26T08:00:00+08:00",
  "newsletter_queue_size": 70,
  "last_notification_sent": "2026-05-25T23:45:00+08:00",
  "projects": {
    "my-project": {
      "last_status": "in_progress",
      "open_prs": 2
    }
  }
}

This solves a practical problem: if project state hasn’t changed at all, cron shouldn’t send “everything is fine” every hour. With hermes-heartbeat-state.json, cron compares before/after snapshots and pushes only when there’s a difference.

candidates.json

Written by the research workflow, records current research candidate topics and their divergence progress. Daily-reflection cron reads it to judge which research is nearing conclusion threshold and needs follow-up in the reflection.

4.3 Design Intent: Keep Each System Stateless

The core intent of centralizing runtime state in status/ is: keep skill, workflow, and cron logic itself stateless.

Stateless benefits: testable, resettable:

  • Skills describe how to use tools, store no runtime results — same skill behaves consistently on different machines
  • Workflow playbooks describe flow logic, don’t embed last execution results — can modify playbooks anytime without affecting historical state
  • Cron prompts describe what to do this time, read context from status/ — even if cron is paused and resumed, state isn’t lost

An analogy: status/ is this system’s “workbench” — tools (skills) hang on the wall, SOPs (workflows) are posted on the whiteboard, schedules (cron) remind on time — only the materials on the workbench change in real time. Tools and SOPs themselves don’t change because workbench materials changed.

Much clearer than the OpenClaw era of mixing state in NOW.md — NOW.md is a human-readable summary, hermes-heartbeat-state.json is raw data for machine diff comparison. Different audiences and purposes; shouldn’t be mixed.


4.4 state.db: The Other Half of the Status Layer

The status/ directory holds human-readable, directly-scriptable snapshot data — current project state, email queue state, heartbeat timestamps. It targets instant queries like “what’s the current state?”

But the system has another category of data: structured history that needs to accumulate across time. Newsletter sender novelty score history (for FSM evaluation), all session metadata (for session_search), cron execution records — these all live in state.db (SQLite), not JSON files.

The division of responsibilities:

workspace/status/*.jsonstate.db
FormatHuman-readable JSONSQLite structured queries
PurposeCurrent snapshots, instant reads/writesHistorical accumulation, cross-time queries
Typical dataProject state, heartbeat timestampsSession history, newsletter FSM evaluation data
WriterScripts write directlyHermes internals + hooks + plugins

session_search queries the sessions table in state.db; the newsletter FSM’s novelty_scores history is also in state.db, updated by email_check.py on each archive. Describing only JSON files would leave readers thinking system state is a pile of independent key-value files, missing the underlying time-series data support.


4a. Hooks: The Glue Between Memory System Layers

Beyond the memory system and cron, there’s another mechanism that connects the whole system — Hooks.

Hermes hooks are event-driven extension points: attach custom Python scripts (handler.py + HOOK.yaml declaring which events to listen for) to lifecycle events like session:start, session:end, agent:start. This is the key mechanism that makes the memory system “flow automatically” without manual triggering.

This system has two hooks, each serving a different “glue” role:

Hook 1 — session-now-reporter (listens to agent:start)

On the first conversation turn of a new session, automatically reads the now table in memory.db, formats it, and pushes it to the user via Telegram Bot API. Has built-in deduplication (records already-pushed session_ids), only pushes once per session.

Role: connects Layer Zero memory (the now table) with the Telegram notification channel — making “current status” not just readable by the agent internally, but actively surfaced to the user at the start of every conversation.

Hook 2 — session-daily-writer (listens to session:end)

On session end, queries that session’s metadata from state.db, reads the corresponding JSON file from the sessions/ directory, automatically extracts a short title from the first user message (with Chinese/English support), generates YYYY-MM-DD-shorttitle.md and writes it to workspace/memory/daily/. Cron sessions are automatically skipped; sessions with fewer than 2 messages are also skipped.

Role: connects the state.db session history with the workspace/memory/daily/ journal layer — this is the auto-production mechanism for “daily journal” entries. Readers might wonder where the files in daily/ come from; this is the answer.

Division with the session-daily-scan cron:

  • Hook handles normally-ended sessions (real-time writes)
  • session-daily-scan cron (hourly) re-scans sessions where the hook didn’t trigger (process crash, timeout, etc.)
  • Both complement each other, ensuring no gaps in the journal layer
graph LR
    subgraph hooks["Hooks (event-triggered)"]
        H1["session-now-reporter<br/>agent:start"]
        H2["session-daily-writer<br/>session:end"]
    end
    subgraph cron_scan["Cron (safety net)"]
        C1["session-daily-scan<br/>hourly"]
    end
    H1 -->|"reads"| NT["now table"]
    NT -->|"push"| TG[Telegram]
    H2 -->|"reads"| SD["state.db sessions"]
    H2 -->|"writes"| DL["daily/*.md"]
    C1 -->|"backfill writes"| DL

Hooks reflect Hermes’s extensibility philosophy: without modifying upstream code, attach custom scripts to lifecycle events to implement personalized features. The now-injector plugin (which injects the now table into the system prompt) and these two hooks together form the “autopilot” part of the memory system — no manual maintenance needed; the system runs continuously in the background.


5. Cron Integration: Making Hermes Move on Its Own

Cron is Hermes’s scheduling mechanism; workflow is my own SOP — they collaborate by having the operational layer in workflows triggered automatically by cron, while tactical and strategy layers stay in manually driven sessions. This chapter covers only the cron half.

5.1 Where Did Heartbeat Go

OpenClaw had two automation mechanisms: Heartbeat (poll at fixed intervals) and Cron (trigger at precise schedules). In Hermes, these unified into one thing — Cron.

OpenClaw’s Heartbeat was essentially “a scheduled task that runs every N minutes.” Hermes cron supports interval syntax like every 30m, every 1h, directly replacing Heartbeat:

OpenClaw Heartbeat (every 60 minutes)
  ≡ Hermes cron --schedule "every 1h"

OpenClaw Cron (daily at 23:45)
  ≡ Hermes cron --schedule "45 23 * * *"

Migrating from OpenClaw, no need to separately maintain HEARTBEAT.md defining “what to do each heartbeat” — split those checks into independent cron jobs, each with its own responsibility, actually clearer.

5.2 Two Execution Modes

Hermes cron has two fundamentally different execution modes — choosing wrong has high cost:

LLM-driven mode (default)
  Each tick → agent reads prompt → calls tools → makes judgments → returns result
  Suitable for: tasks requiring reasoning
  Cost: LLM tokens consumed each time

no_agent mode
  Each tick → directly run script → stdout as message output
  Suitable for: pure data collection, threshold alerts (no thinking needed, just execution)
  Cost: zero token consumption

A daily scenario comparison:

Scenario: Check email every morning, notify me if there’s important mail

Using no_agent: write a Python script, pull unread emails via Gmail API, filter by rules (sender on whitelist / title contains keyword), output summary to stdout on hit, empty on miss. Hermes pushes non-empty stdout to Telegram, silent on empty stdout. Entire flow zero LLM, fast, cheap.

Using LLM-driven: each tick, agent reads email list, judges which is important, how to summarize, what tone for the push. Suitable when judgment logic is complex and can’t be fully described by rules — like “help me judge if this email needs a reply today, and draft a short response.”

Cost of misuse: giving the yes/no “are there new newsletters” judgment to LLM-driven means paying for GPT-4 each time to ask “is this list empty?” Giving “today’s market data analysis” to no_agent means the script can only output raw data, no reasoning — what gets sent is a pile of numbers, not analysis.

A common anti-pattern is mixing data collection and analysis in one LLM-driven cron. This makes the LLM spend lots of tokens reading data, while actual analysis quality drops. Correct approach is separation:

flowchart LR
    A["no_agent cron, data collection script"] -->|"stdout to JSON"| B[("context_from, auto-injected")]
    B --> C["LLM-driven cron, analysis and reasoning"]
    C -->|"push result"| D[Telegram]

Hermes’s context_from parameter directly supports this chained scheduling.

5.3 Creating and Managing Cron

Create via Hermes CLI or in-session conversation:

# Create a no_agent scheduled task (directly run script)
hermes cron create \
  --name "daily-newsletter-archive" \
  --schedule "0 8 * * *" \
  --no-agent \
  --script "~/.hermes/scripts/email_check.py" \
  --deliver "telegram"

# List all crons
hermes cron list

# Manually trigger once (for debugging)
hermes cron run <job_id>

# Pause / resume
hermes cron pause <job_id>
hermes cron resume <job_id>

You can also say directly in Telegram: “Create a scheduled task to check the newsletter queue every morning at 8” — Hermes asks for details then creates automatically.

5.4 Several Crons Actually Running in Production

Below are three crons actually running in production, with trimmed prompts for reference.


daily-newsletter-archive (LLM-driven, every 4 hours)

The actual run frequency is every 4 hours (0 */4 * * *), not “once daily” — the newsletter queue can receive new items at any time, and higher frequency keeps backlog from building up. Each run processes one queue item to prevent long run times.

flowchart TD
    A(["Scheduled trigger"]) --> B["email_check.py newsletter-queue next"]
    B --> C{"Queue has content?"}
    C -->|"item: null"| D["Silent exit, no output"]
    C -->|"has item"| E["Fetch full email"]
    E --> F{"Sender type?"}
    F -->|"Full-content"| G["Use email body directly"]
    F -->|"Preview-only"| H["Open original link via browser_navigate to fetch full content"]
    G & H --> I["Language check: Chinese → full original / English → paragraph original+translation"]
    I --> J["Write to notes, compute novelty_score"]
    J --> K["Update FSM state"]
    K --> L["Mark processed + log"]

Prompt core (trimmed):

【Newsletter Archive】Process up to 10 newsletter queue items, one at a time.

Step 1 — Get next queue item
python3 ~/.hermes/scripts/email_check.py newsletter-queue next
If returns "item": null → queue empty, silent exit (no output)

Step 2 — Fetch full email
gog gmail get <msg_id> [--body]

Step 2b — Determine sender type
Full-content senders → use email body directly, skip Step 3
Preview-only senders (Substack/a16z etc.) → must open original link to fetch full content

Step 3 — Truncation detection + fetch (required for Preview-only)
Detect "Read in browser" link or email body < 30% of historical average → open with browser_navigate

Step 4 — Language check
Chinese → post full original  |  English → paragraph-by-paragraph original + translation

Step 5 — Write notes
Path: ~/Projects/DigitalGarden/newsletter/<author_key>/YYYY-MM-DD-<slug>.md
Format: frontmatter + ## Original + ## Translation + ## Key Content + ## Analysis

Step 6 — Compute novelty_score
python3 ~/.hermes/scripts/email_check.py newsletter-novelty <author_key> --keywords "..." --msg-id <msg_id>

Step 7 — Update FSM state
python3 ~/.hermes/scripts/email_check.py newsletter-fsm update <author_key> --novelty <score>

Step 8 — Mark complete + log
python3 ~/.hermes/scripts/email_check.py newsletter-queue done <msg_id>
python3 ~/.hermes/scripts/memlog.py "Newsletter archive: [subject]" "[author_key]"

This cron is LLM-driven rather than no_agent not just because of translation — sender type classification, truncation detection, novelty scoring, and FSM state updates all require reasoning and judgment. Silent exit is key design — when the queue is empty, no notification, no disturbance.


daily-stock-review (no_agent, daily at 23:00)

flowchart TD
    A(["23:00 trigger"]) --> B["stock_review.py runs"]
    B --> C{"Execution result"}
    C -->|"ok"| D["Result written to StockReview/YYYY-MM-DD/, script includes Telegram push"]
    C -->|"warn/error"| E["Push exception alert"]

Prompt core section:

【A-Share Review】It's 23:00, run today's A-share market review.

1. Run review script: python3 ~/.hermes/scripts/stock_review.py
2. Review results written to vault/StockReview/YYYY-MM-DD/
3. Script includes Telegram push logic internally

Script outputs ✅ / [ok] / [warn] / [error], handle as-is.

Simplest no_agent mode — prompt is minimal, essentially just telling where the script is and where results go. Data fetching, analysis, formatting, push all happen inside stock_review.py; the LLM doesn’t participate in any step.


daily-reflection (LLM-driven, daily at 23:45)

flowchart TD
    A(["23:45 trigger"]) --> B["session-daily-scan.py, scan all today's sessions"]
    B --> C["Read daily/YYYY-MM-DD.md + session_search supplement"]
    C --> D["Scan skills/ directory, identify mergeable/SOP-ready skills"]
    D --> E{"Maintenance suggestions?"}
    E -->|"Yes"| F["Append Skills maintenance suggestions to reflection end"]
    E -->|"No"| G["Skip"]
    F & G --> H["Generate reflection, send Telegram"]
    H --> I["Update now table, recent_events / pending / today_focus"]

Prompt core section (trimmed):

【Daily Reflection】It's 23:45, run today's reflection:

Step 1: Scan sessions
python3 ~/.hermes/workspace/scripts/session-daily-scan.py

Step 2: Read today's daily file
Read memory/daily/YYYY-MM-DD.md, supplement key events with session_search.

Step 3: Scan skills health
Check skills/ directory, identify skills needing maintenance:
- [MERGE] Two skills in same domain with duplicate steps
- [SOP] Run multiple times with fixed path, meets SOP criteria
Append findings to reflection end if any; skip if none.

Step 4: Generate reflection and push Telegram
Format: today's key events (3-5) / cognitive gains /
      Skills maintenance suggestions (from Step 3) /
      pending cleanup suggestions / tomorrow's priorities

Step 5: Update now table
recent_events gets today's summary, pending clears resolved items,
today_focus refreshes to tomorrow's focus.

This cron has a design not mentioned earlier in the article: Step 3 scans skills health. During daily reflection, also scan the skills directory, identifying skills that can be merged or solidified into workflows. This is the automated implementation of Section 3.4’s “from repeated operations to SOP extraction” logic — not waiting for the user to discover it, but having the agent check every night and append suggestions to the reflection push.

Reflection’s value isn’t just recording what happened today, but cognitive extraction + system maintenance happening together. This step can’t be replaced by no_agent: extraction requires judgment, skills health identification requires reasoning.


daily-project-review (LLM-driven, daily at 21:00)

flowchart TD
    A(["21:00 trigger"]) --> B["projects.py sprint --control"]
    B --> C["Parse sprint_health.score / pending_actions / stale_actions"]
    C --> D{"Health score >= 8?"}
    D -->|"Yes"| E["Push ✅ status summary"]
    D -->|"No"| F["Highlight stale_actions + suggested actions"]
    E & F --> G["Telegram push"]

Runs projects.py sprint --control, parses the returned JSON — sprint health score (1-10), pending deviation items, overdue deviation items — and pushes a formatted project status report. This is the concrete implementation of Chapter 3’s “operational layer cron” logic: no need to manually ask “how are the projects going” in session; it reports automatically every day at 21:00.


weekly-knowledge-distill (LLM-driven, Sunday 22:00)

This is the cron that makes the memory system “accumulate” rather than just “record.” Runs once a week, doing two things:

  1. Extract lessons/decisions from daily_log: Read the past 7 days of event logs, identify lessons and decisions worth keeping long-term, write into memory.db’s lessons / decisions tables. Also mark entries not verified in 90+ days as stale.

  2. Auto-generate skill files: For high-priority lessons/decisions, judge whether they’re worth solidifying into skills — if they describe a complete multi-step workflow, a pitfall experience, or a non-obvious technical decision, automatically create a .md skill file under ~/.hermes/skills/.

This design represents automated knowledge distillation: experience from daily use (written to daily_log) → weekly extraction into reusable knowledge (lessons/decisions) → high-value content further solidified into skills, forming a complete knowledge accumulation cycle.


weekly-memory-gc (no_agent, Sunday 00:00)

Database maintenance cron that handles three things: archive daily_log entries older than 90 days (write to archive table then delete originals), archive overdue stale lessons/decisions, then run SQLite VACUUM to reclaim space. Runs silently, no Telegram notification.


5.5 Cron Behavioral Conventions

These principles, repeatedly confirmed in practice, are worth stating explicitly:

1. Cron doesn’t ask questions. Scheduled tasks run without a user present; asked questions go unanswered. When information is missing, either continue with defaults or silently skip — can’t get stuck.

2. Cron doesn’t auto-push. Any external write operation (git push, send email, modify Linear issue) requires explicit user trigger in session — not in cron.

3. Cron doesn’t modify memory content. memory.db is a private knowledge base; write decisions should have a human in the loop. Cron can write drafts to temporary files under status/, waiting for user confirmation in session before writing — this “cron collects, session confirms” division is by design, not a Hermes constraint.

4. Empty output = silent. In no_agent mode, empty stdout sends no message. This is Hermes’s design — alert-type crons naturally fit this semantics: quiet when fine, speak when there’s a problem.


6. Complete Directory Structure

~/.hermes/

├── SOUL.md                    ← Identity + behavioral principles + security hard rules
├── config.yaml                ← Model, provider, channel config
├── .env                       ← API keys (not in git, backed up separately)

├── skills/                    ← Skill library (one directory per skill)
│   ├── email-monitoring/
│   │   └── SKILL.md
│   ├── linear/
│   │   └── SKILL.md
│   ├── github-pr-workflow/
│   │   └── SKILL.md
│   └── ...

├── hooks/                     ← Lifecycle hooks (one directory per hook)
│   ├── session-now-reporter/
│   │   ├── HOOK.yaml
│   │   └── handler.py
│   └── session-daily-writer/
│       ├── HOOK.yaml
│       └── handler.py

├── workspace/
│   ├── AGENTS.md              ← Operational entry point: session checklist, workflow routing
│   ├── WORKFLOW.md            ← Workflow index
│   ├── workflow/              ← Workflow playbooks
│   │   ├── 00-create-workflow.md
│   │   ├── 01-newsletter.md
│   │   └── 02-project-review.md
│   │
│   ├── status/                ← Runtime snapshots (agent read-only, scripts write)
│   │   ├── MAILLIST.json
│   │   ├── PROJECTS.json
│   │   ├── hermes-heartbeat-state.json
│   │   └── candidates.json
│   │
│   ├── memory/                ← Journal files (auto-generated by hook)
│   │   └── daily/
│   │       └── YYYY-MM-DD-shorttitle.md
│   │
│   └── scripts/               ← Data collection / utility scripts
│       ├── email_check.py
│       ├── memlog.py
│       ├── stock_review.py
│       └── trade_cal.py

├── memory.db                  ← Memory database (FTS5 indexed, includes now table)
├── state.db                   ← State database (sessions, FSM history, cron records)
└── sessions/                  ← Raw session history files (session_search data source)

Compared to the OpenClaw era, the biggest structural changes: skills/ went from embedded descriptions in AGENTS.md to independent directories, each skill a separately maintainable, separately publishable file; status/ went from scattered in memory/ to a dedicated directory — runtime state and memory system fully separated; hooks/ is a brand-new extension layer making the automation glue between memory system components explicit; memory.db and state.db dual-database division replaces the OpenClaw era’s single NOW.md + MEMORY.md combination.


7. Some Observations

Migration was a system re-audit. Moving from OpenClaw to Hermes wasn’t just switching tools — it was re-examining all accumulated conventions: which were genuinely useful, which were workarounds from tool limitations. SOUL.md hard rules, status/ path conventions, cron behavioral principles — all got clearer implementations on the new platform.

Skill system maintainability improvement is real. In the OpenClaw era, when a skill had issues, you’d dig through AGENTS.md for the corresponding section, modify, then test. Now each skill is an independent file with version, description, and pitfalls section. Description outdated? Patch directly. Hit a new pitfall? Update immediately. This feedback loop got much shorter.

Cron’s “consciousness” is the key upgrade. LLM-driven scheduled tasks don’t just run scripts — they read context, make judgments, decide next steps based on results. daily-reflection can selectively extract valuable content instead of stuffing all of today’s conversations into memory — something system crontab calling shell scripts can never do.

This system is still evolving. Research FSM, newsletter state machine, memory decay mechanism — these weren’t designed upfront, they grew from repeated use. Good AI assistant configuration is essentially building a framework for a learning system: stable enough for learning to land somewhere; flexible enough for practice to feed back into design.

May you also have an AI assistant that truly understands you.

Comments