cscott/ask-annie

Go to file

Conan Scott ad3c5616b2 Update README.md

2026-03-30 10:21:56 +00:00

Add chapters for session 981433083 (Haiku)

2026-03-24 10:10:30 +00:00

Batch chunks from ingest.py

2026-03-24 21:37:41 +11:00

.gitignore

Ignore output artifacts

2026-03-23 23:49:27 +00:00

ask-annie-videos-list.txt

added video list

2026-03-24 14:59:58 +11:00

batch_transcribe.py

Add batch_transcribe.py

2026-03-24 04:27:51 +00:00

chapters-1020102626.json

added ask-annie-video-list.txt

2026-03-24 14:43:47 +11:00

ingest_to_rag.py

Add RAG ingestion script

2026-03-24 03:28:59 +00:00

ingest.py

Switch ASR from Fish Audio to Deepgram Nova-3

2026-03-24 03:20:41 +00:00

README.md

Update README.md

2026-03-30 10:21:56 +00:00

requirements.txt

Add requirements.txt

2026-03-24 01:26:39 +00:00

transcribe.py

Add transcribe.py — step 1 of pipeline

2026-03-24 04:25:43 +00:00

README.md

Ask Annie — ST Best Practices Session Ingestion

Ingestion pipeline for Axway MFT User Group "Ask Annie" Q&A sessions on Vimeo.

What it does

Downloads audio from a Vimeo URL via yt-dlp
Transcribes with Whisper (timestamped segments)
Slices transcript into per-chapter chunks using a chapters JSON file
Optionally extracts frames from demo-heavy chapters for vision annotation
Outputs chunks.json ready for ingestion into knowledge-mcp

Usage

python3 ingest.py \
  --url 'https://vimeo.com/1020102626' \
  --chapters chapters/1020102626.json \
  --out ./out \
  --whisper-model medium

Add --frames to also extract video frames for demo chapters (requires video download).

Dependencies

brew install yt-dlp ffmpeg
pip install openai-whisper

Repo structure

ingest.py                    # Main pipeline script
chapters/<video_id>.json     # Chapter list per session
out/<video_id>/              # Output (gitignored)
  audio.mp3
  transcript.json
  chunks.json
  frames/

Adding a new session

Create chapters/<video_id>.json with timestamp + title + summary per chapter
Run ingest.py --url <vimeo_url> --chapters chapters/<video_id>.json
Review out/<video_id>/chunks.json
Ingest chunks into knowledge-mcp notebook securetransport-md