OpenClaw Training(II): Using OpenClaw to Turn Repetitive Work into SOPs

The greatest value of expensive LLM tokens isn’t doing something once — it’s turning “doing it once” into “never having to do it yourself again.” The tokens you spend in Claude Code and Cursor aren’t really valuable for their one-time output. Their real value lies in whether that experience gets distilled into instructions you never have to explain again.

But saying “do it this way next time” out loud isn’t enough. Every time a session ends, the AI forgets. The next conversation starts fresh — you’re back to re-explaining context, re-defining preferences, re-specifying output formats.

This post documents how I built a workflow system in OpenClaw to solve this problem for good. One goal: let the AI assistant read a few files and independently execute any repetitive task from scratch, without me having to explain much.

The Root Problem

AI assistants are fundamentally stateless. Each session is isolated — there’s no shared memory across sessions. Even within a single session, when conversations grow too long, the system auto-compresses (Compaction) and discards early content.

This means: every agreement you’ve made about “handle this thing this way going forward” disappears when the session ends. Next time, you start over.

There’s only one fix: pull agreements out of conversation and write them into files. Files persist across sessions. Conversations don’t.

The Architecture

The workflow system has three layers:

AGENTS.md (Behavior Layer) — The AI reads this file on every startup. It defines the meta-rules for “how to handle workflows”: when a task comes in, first try to match a trigger phrase; if matched, follow the SOP; if not, improvise.

WORKFLOW.md (Index Layer) — A directory of all existing workflows, containing trigger phrases, file paths, status, TTL, and a one-line summary. The AI reads this when matching trigger phrases.

workflow/NN-slug.md (Execution Layer) — One Playbook file per workflow, containing complete step-by-step instructions for executing that task end-to-end.

workspace/
├── AGENTS.md              # Meta-rules: how to dispatch and execute workflows
├── WORKFLOW.md            # Index: trigger phrase → file mapping
└── workflow/
    ├── 00-create-workflow.md    # Meta-workflow: how to create new workflows
    ├── 01-stock-review.md       # Daily A-share market review
    ├── 02-research-obsidian.md  # Research → Obsidian archiving
    └── 03-hiring.md             # Hiring pipeline management

Layer 1: Workflow Rules in AGENTS.md

AGENTS.md is the “operating system” of the entire setup — the AI reads it on every startup. The workflow-related section looks like this:

## Workflow Playbooks

Workflow SOPs live in the `workflow/` directory. `WORKFLOW.md` is the index (loaded at session start).

### Execution Rules

When a task comes in, first check WORKFLOW.md for a matching trigger phrase:
- **Match** → read the corresponding playbook file, execute per SOP
- **No match** → improvise

Execution logic (based on status + freshness):

├─ status = draft      → warn user this is unverified, proceed with caution
├─ status = deprecated → refuse to execute, notify user it's deprecated
└─ status = active
   ├─ last_verified is within ttl → execute directly (autopilot mode)
   └─ last_verified has exceeded ttl → execute while reviewing
      ├─ evaluate each step for better alternatives
      ├─ if improvements found → propose after execution, update playbook after user confirms
      └─ if no improvements → only update last_verified

### Execution Logs

After each workflow execution:
1. Append an execution record to the day's journal (memory/YYYY-MM-DD.md)
2. Update the last_run field in the workflow file

### Daily Reflection Workflow Report (23:45 cron)

Scan workflows executed today and add to the reflection push:

| Workflow | Result | Deviation | Improvement |
|----------|--------|-----------|-------------|
| ...      | ✅/⚠️  | ...       | ...         |

After user confirms, AI writes improvements back to the playbook. Never unilaterally modifies active workflow SOPs.

This configuration does a few things:

Hardcodes the trigger logic — The AI doesn’t need to remember “go check workflows,” because it re-reads AGENTS.md on every startup and the rule is right there
Status machine defined in AGENTS.md — How to execute under each status is described here once, not repeated in every Playbook
SOP changes require human confirmation — When the AI finds improvements, it surfaces them but doesn’t unilaterally modify active SOPs; humans stay in the loop

Layer 2: The WORKFLOW.md Index

# WORKFLOW.md — Workflow Index

_Last updated: 2026-03-16_

> When a task comes in, match trigger phrases first. If matched, read the file and execute per SOP. If not, improvise.

## Workflow List

| # | Workflow | Triggers | File | Status | TTL | Summary |
|---|----------|----------|------|--------|-----|---------|
| 00 | Create Workflow | new workflow, create workflow, build SOP | workflow/00-create-workflow.md | active | 180d | Meta-process for creating any new workflow |
| 01 | Stock Review | review, today's stocks, market analysis, run review | workflow/01-stock-review.md | active | 30d | Run review script, output to Obsidian, push summary |
| 02 | Research Archive | research, help me study, learn X, organize materials | workflow/02-research-obsidian.md | active | 90d | Write research topics into Obsidian DigitalGarden |
| 03 | Hiring Management | interview, candidate, offer, schedule interview | workflow/03-hiring.md | active | 90d | Linear Interview project management, calendar scanning |

## Execution Rules (Quick Reference)

- **draft** → warn user it's unverified, proceed with caution
- **active + within TTL** → execute directly
- **active + TTL exceeded** → execute while reviewing, update last_verified after
- **deprecated** → refuse to execute, notify user it's deprecated
- **skill missing** → notify, suggest installing or creating, don't install automatically

The index lets the AI find the right Playbook in O(1) cost, rather than scanning every file in workflow/ each time. Trigger phrases should be specific enough to avoid false positives.

Layer 3: Playbook Files

Frontmatter Spec

Each Playbook file has YAML frontmatter recording the workflow’s metadata:

---
title: "Hiring Management"
created: 2026-03-15
last_run: ~
last_verified: 2026-03-15
ttl: 90d
status: active
skills: [linear-cli]
tags: [hiring, interview, linear]
---

Of these fields, ttl and last_verified are the most important — they jointly control the TTL mechanism (explained below). The skills field lets the AI know what tools are needed before execution, so it can notify upfront rather than failing halfway through.

Playbook Content Structure

A complete Playbook contains these sections:

Trigger Conditions  → What phrases trigger this workflow (matches WORKFLOW.md triggers)
Pre-flight Checks   → Conditions and parameters to confirm before execution
Execution Steps     → Step by step, with specific commands, code snippets, decision logic
Permission Boundary → What the AI can decide autonomously, what requires asking the user
Output / Deliverables → What gets produced, where it goes, who gets notified
Error Handling      → Common failure scenarios and how to handle them
Background          → How this workflow came to be (useful for future revisions)

The Permission Boundary section is especially important. Explicitly writing out “what the AI can decide autonomously” and “when it must ask the user” gives the assistant clear boundaries for self-direction, rather than interrupting you for confirmation every time.

From the hiring management Playbook, the permission boundary section reads:

AI can decide autonomously:

Calendar scanning method (direct SQLite read; AppleScript forbidden — this is a hard-won constraint; a new assistant won’t know without it written down)

Logic for matching calendar events to Linear issues

Issue priority auto-determined based on status

Must ask the user:

Interview outcome for a candidate (pass/reject)

Whether to extend an offer

Making Tacit Knowledge Explicit

The most valuable content in a Playbook often isn’t the steps themselves — it’s the constraints and lessons-learned behind the steps. These are exactly what gets lost in verbal handoffs.

The hiring management Playbook has this snippet:

# ⚠️ AppleScript is forbidden for reading calendar (hangs indefinitely in sandbox)
# Read SQLite directly instead:

EPOCH = datetime.datetime(2001, 1, 1)
DB = "/path/to/Calendar.sqlitedb"
# start_date is stored as UTC, needs +8h conversion to CST
start_cst = EPOCH + datetime.timedelta(seconds=sd) + datetime.timedelta(hours=8)

Linear State IDs are hardcoded directly:

Todo:        624bb002-4790-4e13-86fa-418c64094ba0
In Progress: c8d48dc1-625c-42cc-9c14-27d02ae583d9
...

Without writing these down, a new assistant has to rediscover them every time — or silently step on the same landmine.

The TTL Mechanism: Why Workflows Need an Expiry Date

This is one of the most overlooked yet most important design decisions in the whole system.

The Problem: SOPs Go Stale

Tool versions change. External platform APIs get updated. Personal preferences evolve over time. A Playbook written six months ago might have better approaches available today, or some steps might be completely broken — but without a mechanism to force a check, the AI keeps executing the old version and you never notice.

The Solution: TTL + last_verified

Each Playbook has two fields:

ttl: The validity period for this workflow (e.g. 30d, 90d, 180d)
last_verified: The last date it was confirmed to still be valid

Together they calculate the “freshness deadline”: last_verified + ttl

The AI checks this deadline at execution time:

active + last_verified + ttl > today
  → Execute directly, autopilot mode, no questioning of steps

active + last_verified + ttl ≤ today (expired)
  → Execute while reviewing
  → Evaluate each step: is there a better approach?
  → After execution: if improvements found → surface them, write back after user confirms
  → If no improvements → only update last_verified, resume normal execution next time

The elegance of this design: TTL expiry doesn’t mean “stop executing” — it means “enter review mode.” The workflow still runs; the AI just maintains a critical eye while executing and surfaces any issues it finds.

How to Set TTL Length

Workflow Type	Suggested TTL	Reason
Workflows depending on CLI tools	30d	CLI versions iterate fast, interfaces may change
Workflows depending on external platforms (API, web)	14d	Platforms update frequently, APIs change
Pure process workflows (hiring, research)	90d	The process itself is relatively stable
Meta-workflows (e.g. “how to create a workflow”)	180d	Architecture-level decisions, rarely change

The Meta-Workflow: Using a Workflow to Manage Workflows

This is my favorite part of the system: 00-create-workflow.md is a workflow for creating new workflows.

It solves a natural question: since “creating a new workflow” is itself a repetitive task, why not SOP it too?

The full file:

---
title: "Create Workflow"
created: 2026-03-15
last_verified: 2026-03-15
ttl: 180d
status: active
skills: []
---

# Create Workflow Playbook

## Trigger Conditions
Say: new workflow, create workflow, build SOP, turn X into a workflow

## Pre-flight Checks
- [ ] Confirm the name and purpose of the new workflow
- [ ] Confirm trigger phrases (2-4 keywords)
- [ ] Confirm required skills
- [ ] Confirm TTL (refer to type recommendation table)

## Execution Steps

### Step 1 — Determine the Number
Check the workflow/ directory, take the current highest numeric prefix + 1 as the new file number.

### Step 2 — Create the Playbook File
Filename format: workflow/NN-slug.md (slug in lowercase English + hyphens)

Frontmatter template:
---
title: "Workflow Name"
created: YYYY-MM-DD
last_run: ~
last_verified: YYYY-MM-DD
ttl: 30d
status: draft
skills: []
tags: []
---

New workflows default to status: draft; change to active after first verified execution.

### Step 3 — Fill in Playbook Content
Must include these sections:
- Trigger Conditions
- Pre-flight Checks (with parameters/conditions)
- Execution Steps (step by step, with specific commands)
- Permission Boundary (what AI can decide autonomously, what must be asked)
- Output / Deliverables
- Error Handling

### Step 4 — Update WORKFLOW.md Index
Append a row to the table:
| NN | Workflow Name | Triggers | workflow/NN-slug.md | draft | TTL | Summary |

### Step 5 — Notify User
New workflow created, status draft, can be promoted to active after first execution.

## Proactive Opportunity Detection

When something recurs ≥ 3 times without a corresponding workflow, proactively suggest:
> "I've noticed [X] has come up a few times — want to build a workflow for it?"

## TTL Recommendation Table

| Type | Suggested TTL |
|------|---------------|
| CLI/API tool dependent | 30d |
| External platform dependent | 14d |
| Pure process (hiring, research) | 90d |
| Meta-workflow | 180d |

## Workflow vs Skill Boundary

| Characteristic | Recommendation |
|----------------|----------------|
| Private, personalized, specific config (IDs, paths, preferences) | workflow/ |
| Generic, reusable, shareable with others | skill (publishable to ClawHub) |
| Depends on generic tools + personal SOP | workflow/ referencing skill |

With this in place, every time a new workflow needs to be created, the AI follows a consistent standard rather than improvising each time. Even better: the AI proactively notices “this thing has come up ≥ 3 times — want to build a workflow?” — delegating the responsibility for spotting opportunities to the system itself.

The Workflow–Skill Relationship

This is a common source of confusion, worth clarifying explicitly.

Skills are OpenClaw’s capability extension units. Each skill is a directory containing SKILL.md (usage instructions) and scripts/ (actual execution scripts). Skills are generic and shareable — OpenClaw ships with 50+ built-in skills, and the community can publish more to ClawHub.

Workflows are private, highly personalized SOPs. They rely on the tool capabilities that skills provide, but contain large amounts of non-generalizable personal configuration: specific project IDs, personal stock-picking methodology, team-specific interview processes, hard-won lessons, personal preferences…

Think of the relationship this way:

skill    = tool (hammer, wrench)
workflow = operations manual (this project's maintenance procedure,
           which tools to use, which pitfalls to avoid)

A concrete example — the hiring management workflow:

It depends on the linear-cli skill (provides Linear API call capability)
But the workflow contains specific Project IDs, Team IDs, State UUIDs, the SQLite path for calendar reading, and the “no AppleScript” constraint that’s specific to my Mac environment
Bundling these into a skill makes no sense — publishing them would be useless to anyone else

The boundary test is one question: Can this be handed directly to someone who knows nothing about me?

Yes → skill
No (because it contains personal config, preferences, or environment) → workflow

Another common pattern is workflow referencing skill:

The research archive workflow (02-research-obsidian.md) has skills: [obsidian, perplexity] in its frontmatter — it needs the obsidian skill (vault operations) and the perplexity skill (AI search), but the specific research steps, file naming conventions, quality checklist, and Obsidian vault path are all personal — managed in the workflow.

The Execution Loop: From Files to Behavior

Once the system is running, the daily execution flow looks like this:

Task received
  │
  ├─ Match WORKFLOW.md trigger phrase?
  │  ├─ Match → read Playbook → check TTL → execute
  │  └─ No match → improvise
  │
  └─ Execution complete
     ├─ Append execution record to today's journal
     └─ Update last_run field in Playbook

23:45 nightly reflection cron
  └─ Scan workflows executed today
     └─ Generate report: results + deviations + improvement suggestions
        └─ User confirms improvements → write back to Playbook

The key design of the loop: SOP changes require human confirmation; the AI only proposes, never unilaterally modifies. This ensures Playbooks don’t silently evolve without anyone knowing.

Observations from Building This

Writing a Playbook is fundamentally an exercise in mapping your complete understanding of a task. In the process of making tacit knowledge explicit, you’ll discover many constraints and preferences you “knew but never said out loud” — like the “read calendar via SQLite, not AppleScript” thing. If I hadn’t been writing a Playbook, I probably would never have deliberately recorded that.

SOPs are run into existence, not designed into existence. The first version of a Playbook doesn’t need to be perfect — good enough to run is good enough. When execution reveals deviations, append them to error handling. When TTL expires and you find a better approach, update the steps. Playbooks get better through actual use.

The more specific your trigger phrases, the fewer false positives. A word like “interview” is easy to accidentally trigger in unrelated conversations. Adding more specific phrases like “candidate,” “schedule interview,” “who hasn’t been followed up with” dramatically improves precision.

For Anyone Who Wants to Build Something Similar

If you’re using OpenClaw, or any AI assistant framework that supports custom system prompts or working-directory files, here’s the fastest path:

Step 1: Find one thing you’ve done ≥ 3 times in the past month. Pick the one with the most tacit knowledge embedded in it.

Step 2: Write the steps in Markdown, focusing on “why do it this way” and “what to do when X happens” — not just a steps list. Write down everything you know that nobody would know without being told: tool constraints, lessons learned, personal preferences.

Step 3: Add a rule to AGENTS.md (or your equivalent system prompt file): when you receive [trigger phrase], read this file and execute per the steps.

Step 4: Run it once. See where it gets stuck. Fill in the gaps. Set a TTL for the Playbook.

Just those four steps. You don’t need to design the whole thing upfront — the system matures through execution.