docs: update README for Postgres+pgvector and add ingestion TODOs

2026-02-12 13:34:08 +11:00
parent 2a6ee399d5
commit 5076486935
1 changed files with 15 additions and 9 deletions
--- a/README.md
+++ b/README.md
@@ -1,11 +1,11 @@
 # knowledge-mcp

-A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Qdrant** and **TEI**.
+A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Postgres + pgvector** and **TEI**.

 ## Overview

 This server enables an agent to:
-1.  Create named "Notebooks" (Qdrant Collections).
+1.  Create named "Notebooks" (Postgres-backed collections).
 2.  Ingest documents (PDF, Markdown, Text) into specific notebooks.
 3.  Query specific notebooks using vector search (RAG).
 4.  Synthesize findings across a notebook.
@@ -15,26 +15,32 @@ Designed to replicate the **NotebookLM** experience: clean, focused, bounded con
 ## Stack
 *   **Language:** Python 3.11+
 *   **Framework:** `mcp` SDK
-*   **Vector DB:** Qdrant
+*   **Vector DB:** Postgres + pgvector
 *   **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5`

 ## Tools

-### `notebook.create`
-Creates a new isolated workspace (Qdrant Collection).
+### `create_notebook`
+Creates a new isolated workspace (Postgres-backed notebook).
 - `name`: string (e.g., "project-alpha")

-### `notebook.add_source`
+### `add_source`
 Ingests a document into the notebook.
 - `notebook`: string
- `url`: string (URL or local path)
+- `content`: string (raw text or local path)
+- `source_name`: string
+- `format`: `text` or `pdf_path`

-### `notebook.query`
+### `query_notebook`
 Performs a semantic search/RAG generation against the notebook.
 - `notebook`: string
 - `query`: string

 ## Configuration
 Env vars:
- `QDRANT_URL`: URL to Qdrant (e.g., `http://qdrant.openshift-gitops.svc:6333`)
+- `DATABASE_URL`: Postgres connection string (e.g., `postgresql://postgres:password@postgres.knowledge-mcp.svc:5432/knowledge`)
 - `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`)
+
+## TODO
+- Add PDF → Markdown/text conversion step to improve extraction quality.
+- Add OCR pipeline for scanned PDFs.