diff --git a/DEMO_SCENARIOS.md b/DEMO_SCENARIOS.md new file mode 100644 index 0000000..40813f8 --- /dev/null +++ b/DEMO_SCENARIOS.md @@ -0,0 +1,171 @@ +# Claude MCP Demo Scenarios + +This repository contains demo scenarios showcasing Claude's operational capabilities with OpenShift through MCP (Model Context Protocol) integration. + +## Demo Scenarios + +### 1. Cluster Health Check & Diagnostics +**Scenario**: "Claude, can you check if my cluster is healthy?" + +**What Claude Does**: +- Lists nodes and checks their status +- Examines critical workloads (control plane, operators) +- Reviews recent events for errors or warnings +- Checks resource consumption (CPU, memory) via metrics server +- Identifies pods in CrashLoopBackOff or other problematic states +- Provides a structured health summary with actionable insights + +**Key Value**: Comprehensive cluster assessment in seconds vs. manual kubectl/oc commands across multiple resources. + +--- + +### 2. Security Review & Hardening +**Scenario**: "Review the security posture of my Calibre deployment and help me lock it down." + +**What Claude Does**: +- Examines pod security context and SCC assignments +- Identifies overly permissive configurations (privileged, anyuid, root user) +- Proposes custom SCCs with minimum viable privileges +- Guides through incremental security hardening +- Documents failure modes and appropriate fixes +- Creates declarative GitOps manifests for security policies + +**Key Value**: Expert security review without needing deep SCC knowledge. Learns the boundaries through experimentation. + +**Real Example**: Successfully hardened Calibre from `anyuid` to `restricted-s6` SCC, discovering s6-overlay compatibility issues and documenting workarounds. + +--- + +### 3. Agentic Problem Solving +**Scenario**: User mentions NFS performance concerns in passing. + +**What Claude Does** (without being asked): +- Creates a test pod with appropriate tools +- Mounts the NFS volume +- Runs performance benchmarks (dd, fio) +- Analyzes results and compares to expected performance +- Cleans up test resources +- Reports findings with context + +**Key Value**: Proactive investigation and validation. Claude doesn't wait for explicit instructions—it understands the implied need and takes action. + +--- + +### 4. Subtle Error Detection +**Scenario**: "Claude, I removed the SCC from the service account and added the new one, but the pod is still using the old SCC. What did I miss?" + +**What Claude Does**: +- Retrieves actual pod spec to see what SA it's using +- Compares to the SA name in the user's command +- Spots the typo: `peantuflix-sa` vs `peanutflix-sa` +- Identifies the root cause immediately + +**Key Value**: Catches typos, wrong namespaces, label selector errors, and other "stupid mistakes" that eat 30+ minutes of senior engineer time. Machines don't autocorrect what humans read. + +**Other Examples**: +- Off-by-one errors in array indices +- Copy-paste artifacts (wrong resource names) +- Namespace mismatches +- Label selectors that silently match nothing + +--- + +### 5. Multi-Tool Orchestration +**Scenario**: "Find all applications using the old Gitea URL and help me migrate them." + +**What Claude Does**: +- Uses ArgoCD MCP to list all applications +- Uses OCP MCP to examine each app's manifests +- Uses Gitea MCP to search repo contents for the old URL +- Proposes a migration plan with git operations +- Can execute the migration if approved + +**Key Value**: Coordinates across multiple systems (ArgoCD, Kubernetes, Git) in a single workflow. Human would need to context-switch between tools. + +--- + +### 6. GitOps Workflow Automation +**Scenario**: "Create a custom SCC for GPU workloads and apply it through GitOps." + +**What Claude Does**: +- Analyzes requirements (hostPath for /dev/dri, but no privilege escalation) +- Creates SCC manifest with appropriate constraints +- Generates ClusterRoleBinding for service account +- Commits both to the okd-platform repo +- ArgoCD picks up changes and applies them +- Validates the pod starts with correct SCC + +**Key Value**: Full GitOps workflow from requirements to validation. Everything is declarative and version-controlled. + +--- + +### 7. Root Cause Analysis +**Scenario**: "My pod won't start. Help me debug it." + +**What Claude Does**: +- Retrieves pod status and events +- Identifies SCC admission errors +- Examines the deployment manifest +- Traces through which SCCs are available to the service account +- Finds the specific constraint violation +- Proposes the minimal fix (not just "use privileged") + +**Key Value**: Systematic debugging following the admission chain. Explains *why* something failed, not just *what* failed. + +**Real Example**: Diagnosed that Plex required `allowPrivilegeEscalation: true` due to s6-overlay's setuid behavior, despite already having hostPath access working. + +--- + +### 8. Documentation & Knowledge Capture +**Scenario**: Throughout any complex task. + +**What Claude Does**: +- Suggests creating documentation as issues are discovered +- Proposes README updates with workarounds +- Generates example manifests with inline comments +- Creates decision records (why we chose X over Y) +- Documents failure modes for future reference + +**Key Value**: Operational knowledge is captured in git, not lost in someone's head. Future engineers (or future-you) benefit. + +--- + +## Demo Structure + +Each scenario should demonstrate: +1. **Natural language input** - No YAML required from the user +2. **Autonomous tool use** - Claude picks the right tools +3. **Iterative problem solving** - When Plan A fails, try Plan B +4. **GitOps-first approach** - Everything through version control +5. **Explanation of reasoning** - Not just "do this," but "here's why" + +## Technical Foundation + +**MCP Servers Used**: +- `openshift-mcp-server` - Kubernetes/OpenShift API operations +- `gitea-mcp-server` - Git repository operations +- `argocd-mcp-server` - ArgoCD application management +- `minio-mcp-server` - Object storage operations + +**Key Capabilities**: +- Direct API access (no kubectl wrapper scripts) +- Multi-step workflows with validation +- Failure recovery and alternative approaches +- Context retention across long conversations +- Integration with existing GitOps workflows + +## Notes for Demo Day + +- **Start simple**: Cluster health check is impressive but approachable +- **Build complexity**: Show multi-tool orchestration after basics +- **Highlight autonomy**: The agentic scenarios (NFS testing) are most impressive +- **Show failure handling**: Claude debugging its own mistakes is powerful +- **Emphasize GitOps**: Everything is declarative and auditable + +## Future Scenarios to Develop + +- **Disaster recovery**: "My cluster is down, help me restore from backup" +- **Capacity planning**: "Will my cluster handle 10x traffic?" +- **Security audit**: "Find all workloads running as root" +- **Cost optimization**: "Which pods are using the most resources?" +- **Compliance checking**: "Do all our apps meet PSS restricted standards?"