Add 03-skills presentation: Skills for AI agents

2026-02-25 22:15:21 +00:00
parent 179115632b
commit 27b8eb3e05
2 changed files with 537 additions and 0 deletions
--- a/03-skills/README.md
+++ b/03-skills/README.md
@@ -0,0 +1,160 @@
 # Skills: Teaching AI How to Think
 **Core thesis: Skills are lazy-loaded expertise packs that make AI agents both more efficient AND more reliable.**
 ---
 ## What Are Skills?
 A **skill** is a bundled unit of domain expertise. At its core, it's a `SKILL.md` file — a structured set of instructions, business rules, edge cases, and guardrails — plus optional scripts, templates, and assets.
 Skills are **not** always loaded into context. They sit on a shelf, described by a short summary. When a task arrives, the agent scans those summaries, identifies which skill (if any) applies, and loads it just-in-time.
 Think of it this way: a professional doesn't carry every manual in their backpack. They know which shelf to reach for. Skills give an AI agent that same organisational awareness.
 ### Anatomy of a Skill
 ```
 weather/
 ├── SKILL.md          # The SOP — instructions, rules, edge cases
 ├── scripts/
 │   └── fetch.py      # Optional helper scripts
 ├── templates/
 │   └── report.md     # Output templates
 └── assets/
    └── icons/        # Supporting files
 ```
 The `SKILL.md` is the brain. Everything else is optional scaffolding.
 ---
 ## Efficiency Through Lazy Loading
 Without skills, you face an uncomfortable trade-off:
 1. **Bloat the system prompt** with every possible instruction — wasting tokens, polluting context, and degrading performance on *all* tasks to marginally improve *some* tasks.
 2. **Leave the agent ignorant** — it doesn't know your SOPs, your preferred approaches, your edge cases. It improvises. Sometimes well. Often not.
 Skills eliminate this trade-off entirely.
 The agent's system prompt contains only **skill descriptions** — a sentence or two each. When a matching task arrives, the full skill loads into context. When the task is done, it's gone. The agent operates with a lean context window most of the time and expert-level depth exactly when needed.
 ### The Numbers
 Consider an agent with 10 skills, each averaging 2,000 tokens of instructions:
 | Approach | Tokens in Context | Quality |
 |---|---|---|
 | Everything in system prompt | 20,000+ always | Degraded (noise) |
 | No skills at all | ~0 | Poor (no expertise) |
 | Lazy-loaded skills | ~500 base + 2,000 when needed | Optimal |
 That's a 10x reduction in baseline context usage with *better* outcomes.
 ---
 ## Reliability Through SOPs
 The `SKILL.md` **is** the standard operating procedure. It doesn't just hint at what to do — it encodes:
 - **Business rules** — "Always use metric units for Australian users"
 - **Edge cases** — "If the API returns a 429, wait 60 seconds before retry"
 - **Preferred approaches** — "Use `ripgrep` over `grep` for speed"
 - **Guardrails** — "Never delete without confirmation; use trash over rm"
 - **Decision trees** — "If X, do Y. If Z, escalate."
 This is the difference between giving a junior engineer a runbook and saying "figure it out." Both might get to the same destination, but one gets there reliably, consistently, and without the creative detours that break production.
 ### Without a Skill (Improvisation)
 > **User:** Check if my server is secure.
 >
 > **Agent:** *runs a few random checks it remembers from training data, misses half the important ones, suggests changes that conflict with your infrastructure*
 ### With the Healthcheck Skill (SOP)
 > **User:** Check if my server is secure.
 >
 > **Agent:** *loads healthcheck SKILL.md → follows structured audit: firewall rules → SSH config → update status → service exposure → generates prioritised report with specific remediation steps*
 Same request. Wildly different reliability.
 ---
 ## Skills vs. Tools vs. System Prompts
 These three layers serve fundamentally different purposes:
 | Layer | Question It Answers | Example |
 |---|---|---|
 | **Tools** | What *can* I do? | "I can read files, search the web, send messages" |
 | **System Prompt** | Who *am* I? | "You are helpful, concise, and safety-conscious" |
 | **Skills** | *How* do I do specific things well? | "Here's how to audit server security step-by-step" |
 All three are needed. Tools without skills is like giving someone a workshop full of power tools but no training. Skills without tools is expertise with no hands. And the system prompt ties it all together with identity and baseline behaviour.
 ---
 ## Real-World Examples
 ### Weather Skill (Simple)
 A lightweight skill that knows how to query weather APIs, format forecasts, handle location lookups, and present results cleanly. Maybe 50 lines of instructions. Loaded when someone asks "what's the weather?"
 ### SecureTransport Flow Engineering (Complex)
 A deep domain skill encoding expertise in Axway SecureTransport — file transfer flows, PGP encryption steps, external script configuration, SFTP ingress patterns, error log locations, and testing harnesses. This is tribal knowledge that took months to accumulate, now available to any agent instantly.
 ### Healthcheck Skill (Security SOPs)
 A structured security audit playbook — firewall configuration, SSH hardening, package update status, service exposure analysis. Follows a defined checklist, produces prioritised findings, and recommends specific remediations aligned with the deployment's risk tolerance.
 ---
 ## Expertise Preservation
 Here's where skills become strategically important, not just operationally convenient.
 **Skills capture tribal knowledge.**
 When your best engineer writes a skill, they're encoding their expertise — the shortcuts, the gotchas, the "here's what the documentation doesn't tell you" — into a format that any agent can use, forever.
 People leave. People forget. People get busy. But a well-written skill persists. It's organisational knowledge management that actually works, because the consumer (the AI agent) follows instructions literally and completely.
 This isn't about replacing experts. It's about **scaling** their expertise. One expert writes the skill. Every agent in the organisation benefits.
 ---
 ## Composability and Community
 Skills are modular by design:
 - **Shareable** — Package a skill and hand it to another team or publish it
 - **Versionable** — Track changes, roll back, evolve with your processes
 - **Stackable** — Multiple skills can be available simultaneously; the agent picks the right one
 - **Discoverable** — Skill descriptions form a searchable catalogue
 ### ClawHub: A Marketplace of Expertise
 Skills can be shared through ClawHub — discovered, installed, and composed by anyone running an OpenClaw agent. This creates a flywheel:
 1. Someone solves a problem well and writes a skill
 2. They share it
 3. Others use it, improve it, contribute back
 4. The collective expertise grows
 It's open-source knowledge, but structured for AI consumption.
 ---
 ## Summary
 Skills solve two problems at once:
 - **Efficiency** — Load expertise on-demand instead of bloating every interaction
 - **Reliability** — Follow defined SOPs instead of improvising on critical tasks
 They also unlock something bigger: a way to capture, share, and scale human expertise through AI agents. Not replacing the expert — amplifying them.
 The question isn't whether your agents need skills. It's what expertise you'd encode first.
--- a/03-skills/presentation.md
+++ b/03-skills/presentation.md
@@ -0,0 +1,377 @@
 <!-- column_layout: [1, 8, 1] -->
 <!-- column: 0 -->
 <!-- column: 1  --> 
 <!-- jump_to_middle -->
 <!-- alignment: center -->
 <!-- font_size: 5 -->
 Skills: Teaching AI How to Think
 <!-- font_size: 2 -->
 Lazy-loaded expertise for efficient, reliable agents
 <!-- no_footer -->
 <!-- speaker_note: Title slide. Core message: skills make agents both smarter AND cheaper to run. -->
 <!-- end_slide -->
 <!-- column_layout: [1, 3, 1] -->
 <!-- column: 1 -->
 <!-- alignment: center -->
 <!-- font_size: 4 -->
 The Problem
 ===
 <!-- reset_layout -->
 <!-- font_size: 2 -->
 <!-- alignment: left -->
 <!-- new_line -->
 <!-- pause -->
 You have two bad options:
 <!-- pause -->
 <!-- new_line -->
 **Option A: Stuff everything into the system prompt**
 <!-- new_line -->
 <!-- pause -->
 - 🪣 20,000+ tokens of instructions, always loaded
 - 🔇 Context pollution degrades *all* tasks
 - 💸 You pay for expertise you're not using
 <!-- pause -->
 <!-- new_line -->
 **Option B: Wing it**
 <!-- new_line -->
 <!-- pause -->
 - 🎲 Agent improvises on critical tasks
 - 🔀 Inconsistent results every time
 - 😬 "Creative" solutions in production
 <!-- pause -->
 <!-- new_line -->
 <!-- alignment: center -->
 <!-- font_size: 2 -->
 Neither is acceptable.
 ===
 <!-- speaker_note: Set up the tension. Everyone building agents hits this wall. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 What is a Skill?
 ===
 <!-- font_size: 2 -->
 <!-- pause -->
 ```mermaid +render +width:100%
 graph TD
    S["🎯 Skill"] --> MD["📋 SKILL.md<br/>Instructions, rules, edge cases"]
    S --> SC["⚙️ scripts/<br/>Helper scripts"]
    S --> TM["📄 templates/<br/>Output templates"]
    S --> AS["📦 assets/<br/>Supporting files"]
    style MD fill:#4a9eff,stroke:#333,color:#fff
    style S fill:#ff6b6b,stroke:#333,color:#fff
    style SC fill:#51cf66,stroke:#333,color:#fff
    style TM fill:#51cf66,stroke:#333,color:#fff
    style AS fill:#51cf66,stroke:#333,color:#fff
 ```
 <!-- pause -->
 <!-- new_line -->
 <!-- alignment: center -->
 **SKILL.md is the brain. Everything else is optional.**
 <!-- speaker_note: The SKILL.md IS the SOP. Scripts and templates are scaffolding. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 Lazy Loading
 ===
 <!-- font_size: 2 -->
 <!-- alignment: center -->
 <!-- pause -->
 How it works: scan descriptions → load on demand → unload when done
 <!-- pause -->
 <!-- new_line -->
 ```mermaid +render +width:100%
 graph LR
    subgraph BEFORE["❌ Without Skills"]
        direction TB
        BP["System Prompt<br/>20,000+ tokens"]
        BP --> W1["Weather SOPs"]
        BP --> W2["Security SOPs"]
        BP --> W3["Deploy SOPs"]
        BP --> W4["Testing SOPs"]
        BP --> W5["... everything else"]
    end
    subgraph AFTER["✅ With Skills"]
        direction TB
        AP["System Prompt<br/>~500 tokens<br/>(descriptions only)"]
        AP -->|"task matches"| L1["Load: Weather<br/>+2,000 tokens"]
    end
    style BEFORE fill:#ffcccc,stroke:#cc0000
    style AFTER fill:#ccffcc,stroke:#00cc00
    style BP fill:#ff6b6b,stroke:#333,color:#fff
    style AP fill:#51cf66,stroke:#333,color:#fff
    style L1 fill:#4a9eff,stroke:#333,color:#fff
 ```
 <!-- pause -->
 <!-- new_line -->
 **10 skills × 2,000 tokens = 20,000 always loaded → ~500 base + 2,000 on demand**
 <!-- speaker_note: This is the efficiency win. 10x reduction in baseline context. Analogy: a pro doesn't carry every manual — they know which shelf to reach for. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 SOPs = Reliability
 ===
 <!-- font_size: 2 -->
 <!-- pause -->
 The SKILL.md **is** the standard operating procedure.
 <!-- pause -->
 <!-- new_line -->
 <!-- column_layout: [1, 1] -->
 <!-- column: 0 -->
 <!-- font_size: 2 -->
 **Without Skill** 🎲
 <!-- new_line -->
 > "Check if my server is secure"
 <!-- new_line -->
 - Runs random checks from training data
 - Misses half the important ones  
 - Suggests changes that conflict with your infra
 <!-- pause -->
 <!-- column: 1 -->
 <!-- font_size: 2 -->
 **With Skill** 📋
 <!-- new_line -->
 > "Check if my server is secure"
 <!-- new_line -->
 - Loads healthcheck SKILL.md
 - Follows structured audit checklist
 - Firewall → SSH → Updates → Services
 - Prioritised report with specific fixes
 <!-- reset_layout -->
 <!-- pause -->
 <!-- new_line -->
 <!-- alignment: center -->
 <!-- font_size: 2 -->
 **Runbook vs. "figure it out"**
 <!-- speaker_note: This is the reliability win. Same request, wildly different outcomes. Like giving a junior engineer a runbook vs saying figure it out. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 Skills vs. Tools vs. System Prompts
 ===
 <!-- font_size: 2 -->
 <!-- pause -->
 <!-- new_line -->
 | Layer | Answers | Example |
 |---|---|---|
 | **Tools** | What *can* I do? | Read files, search web, send messages |
 | **System Prompt** | Who *am* I? | Helpful, concise, safety-conscious |
 | **Skills** | *How* do I do X well? | Step-by-step server security audit |
 <!-- pause -->
 <!-- new_line -->
 ```mermaid +render +width:100%
 graph LR
    T["🔧 Tools<br/>Capability"] --> A["🤖 Agent"]
    SP["🧠 System Prompt<br/>Identity"] --> A
    SK["🎯 Skills<br/>Expertise"] --> A
    style T fill:#ff6b6b,stroke:#333,color:#fff
    style SP fill:#ffd93d,stroke:#333,color:#000
    style SK fill:#4a9eff,stroke:#333,color:#fff
    style A fill:#51cf66,stroke:#333,color:#fff
 ```
 <!-- pause -->
 <!-- new_line -->
 <!-- alignment: center -->
 Tools without skills = workshop full of power tools, no training
 <!-- speaker_note: All three layers are needed. Tools are hands, prompts are personality, skills are expertise. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 Real-World Examples
 ===
 <!-- font_size: 2 -->
 <!-- incremental_lists: true -->
 <!-- new_line -->
 * **Weather Skill** ☀️ — Simple. ~50 lines. Query APIs, format forecasts, handle locations. Loaded when someone asks "what's the weather?"
 <!-- new_line -->
 * **Healthcheck Skill** 🔒 — Structured security audit playbook. Firewall, SSH, updates, service exposure. Prioritised findings with specific remediations.
 <!-- new_line -->
 * **SecureTransport Flow Engineering** 🔐 — Deep domain expertise. File transfer flows, PGP encryption, SFTP patterns, error log locations, testing harnesses. Months of tribal knowledge, instantly available.
 <!-- new_line -->
 * **Skill Creator** 🏗️ — A meta-skill: a skill for *building skills*. Encodes the structure, best practices, and packaging conventions.
 <!-- speaker_note: Range from trivial to deeply complex. The ST skill is a great example of tribal knowledge capture. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- font_size: 4 -->
 The Bigger Picture
 ===
 <!-- font_size: 2 -->
 <!-- pause -->
 <!-- new_line -->
 <!-- column_layout: [1, 1] -->
 <!-- column: 0 -->
 **Expertise Preservation** 🧠
 <!-- new_line -->
 <!-- incremental_lists: true -->
 - Skills capture tribal knowledge
 - Your best engineer writes it once
 - Every agent benefits, forever
 - People leave. Skills persist.
 <!-- pause -->
 <!-- column: 1 -->
 **Composability** 🧩
 <!-- new_line -->
 <!-- incremental_lists: true -->
 - Modular: share, version, stack
 - Discoverable skill catalogues  
 - ClawHub: a marketplace of expertise
 - Community flywheel: solve → share → improve
 <!-- reset_layout -->
 <!-- pause -->
 <!-- new_line -->
 ```mermaid +render +width:100%
 graph LR
    E["👩‍💻 Expert writes skill"] --> P["📦 Published to ClawHub"]
    P --> U["🤖 Agents everywhere use it"]
    U --> F["🔄 Community improves it"]
    F --> P
    style E fill:#ff6b6b,stroke:#333,color:#fff
    style P fill:#4a9eff,stroke:#333,color:#fff
    style U fill:#51cf66,stroke:#333,color:#fff
    style F fill:#ffd93d,stroke:#333,color:#000
 ```
 <!-- speaker_note: This is the strategic argument. Skills aren't just operational convenience — they're knowledge management that actually works. -->
 <!-- no_footer -->
 <!-- end_slide -->
 <!-- column_layout: [1, 8, 1] -->
 <!-- column: 0 -->
 <!-- column: 1  --> 
 <!-- jump_to_middle -->
 <!-- alignment: center -->
 <!-- font_size: 3 -->
 Skills = Efficiency + Reliability
 <!-- new_line -->
 <!-- font_size: 2 -->
 Load expertise on demand. Follow SOPs, not guesswork.
 Capture knowledge. Scale it. Share it.
 <!-- new_line -->
 <!-- font_size: 2 -->
 🎯
 <!-- no_footer -->
 <!-- speaker_note: Closing. The question isn't whether your agents need skills — it's what expertise you'd encode first. -->
 <!-- end_slide -->