Files
ask-annie/README.md
2026-03-30 10:21:56 +00:00

49 lines
1.3 KiB
Markdown

# Ask Annie — ST Best Practices Session Ingestion
Ingestion pipeline for Axway MFT User Group "Ask Annie" Q&A sessions on Vimeo.
## What it does
1. Downloads audio from a Vimeo URL via yt-dlp
2. Transcribes with Whisper (timestamped segments)
3. Slices transcript into per-chapter chunks using a chapters JSON file
4. Optionally extracts frames from demo-heavy chapters for vision annotation
5. Outputs `chunks.json` ready for ingestion into knowledge-mcp
## Usage
```bash
python3 ingest.py \
--url 'https://vimeo.com/1020102626' \
--chapters chapters/1020102626.json \
--out ./out \
--whisper-model medium
```
Add `--frames` to also extract video frames for demo chapters (requires video download).
## Dependencies
```bash
brew install yt-dlp ffmpeg
pip install openai-whisper
```
## Repo structure
```
ingest.py # Main pipeline script
chapters/<video_id>.json # Chapter list per session
out/<video_id>/ # Output (gitignored)
audio.mp3
transcript.json
chunks.json
frames/
```
## Adding a new session
1. Create `chapters/<video_id>.json` with timestamp + title + summary per chapter
2. Run `ingest.py --url <vimeo_url> --chapters chapters/<video_id>.json`
3. Review `out/<video_id>/chunks.json`
4. Ingest chunks into knowledge-mcp notebook `securetransport-md`