Ask Annie — ST Best Practices Session Ingestion
Ingestion pipeline for Axway MFT User Group "Ask Annie" Q&A sessions on Vimeo.
What it does
- Downloads audio from a Vimeo URL via yt-dlp
- Transcribes with Whisper (timestamped segments)
- Slices transcript into per-chapter chunks using a chapters JSON file
- Optionally extracts frames from demo-heavy chapters for vision annotation
- Outputs
chunks.jsonready for ingestion into knowledge-mcp
Usage
python3 ingest.py \
--url 'https://vimeo.com/1020102626' \
--chapters chapters/1020102626.json \
--out ./out \
--whisper-model medium
Add --frames to also extract video frames for demo chapters (requires video download).
Dependencies
brew install yt-dlp ffmpeg
pip install openai-whisper
Repo structure
ingest.py # Main pipeline script
chapters/<video_id>.json # Chapter list per session
out/<video_id>/ # Output (gitignored)
audio.mp3
transcript.json
chunks.json
frames/
Adding a new session
- Create
chapters/<video_id>.jsonwith timestamp + title + summary per chapter - Run
ingest.py --url <vimeo_url> --chapters chapters/<video_id>.json - Review
out/<video_id>/chunks.json - Ingest chunks into knowledge-mcp notebook
securetransport-md
Description
Languages
Python
100%