# Ask Annie — ST Best Practices Session Ingestion Ingestion pipeline for Axway MFT User Group "Ask Annie" Q&A sessions on Vimeo. ## What it does 1. Downloads audio from a Vimeo URL via yt-dlp 2. Transcribes with Whisper (timestamped segments) 3. Slices transcript into per-chapter chunks using a chapters JSON file 4. Optionally extracts frames from demo-heavy chapters for vision annotation 5. Outputs `chunks.json` ready for ingestion into knowledge-mcp ## Usage ```bash python3 ingest.py \ --url 'https://vimeo.com/1020102626' \ --chapters chapters/1020102626.json \ --out ./out \ --whisper-model medium ``` Add `--frames` to also extract video frames for demo chapters (requires video download). ## Dependencies ```bash brew install yt-dlp ffmpeg pip install openai-whisper ``` ## Repo structure ``` ingest.py # Main pipeline script chapters/.json # Chapter list per session out// # Output (gitignored) audio.mp3 transcript.json chunks.json frames/ ``` ## Adding a new session 1. Create `chapters/.json` with timestamp + title + summary per chapter 2. Run `ingest.py --url --chapters chapters/.json` 3. Review `out//chunks.json` 4. Ingest chunks into knowledge-mcp notebook `securetransport-md`