docs: update README for Postgres+pgvector and add ingestion TODOs

This commit is contained in:
Clawdbot
2026-02-12 13:34:08 +11:00
parent 2a6ee399d5
commit 5076486935

View File

@@ -1,11 +1,11 @@
# knowledge-mcp # knowledge-mcp
A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Qdrant** and **TEI**. A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Postgres + pgvector** and **TEI**.
## Overview ## Overview
This server enables an agent to: This server enables an agent to:
1. Create named "Notebooks" (Qdrant Collections). 1. Create named "Notebooks" (Postgres-backed collections).
2. Ingest documents (PDF, Markdown, Text) into specific notebooks. 2. Ingest documents (PDF, Markdown, Text) into specific notebooks.
3. Query specific notebooks using vector search (RAG). 3. Query specific notebooks using vector search (RAG).
4. Synthesize findings across a notebook. 4. Synthesize findings across a notebook.
@@ -15,26 +15,32 @@ Designed to replicate the **NotebookLM** experience: clean, focused, bounded con
## Stack ## Stack
* **Language:** Python 3.11+ * **Language:** Python 3.11+
* **Framework:** `mcp` SDK * **Framework:** `mcp` SDK
* **Vector DB:** Qdrant * **Vector DB:** Postgres + pgvector
* **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5` * **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5`
## Tools ## Tools
### `notebook.create` ### `create_notebook`
Creates a new isolated workspace (Qdrant Collection). Creates a new isolated workspace (Postgres-backed notebook).
- `name`: string (e.g., "project-alpha") - `name`: string (e.g., "project-alpha")
### `notebook.add_source` ### `add_source`
Ingests a document into the notebook. Ingests a document into the notebook.
- `notebook`: string - `notebook`: string
- `url`: string (URL or local path) - `content`: string (raw text or local path)
- `source_name`: string
- `format`: `text` or `pdf_path`
### `notebook.query` ### `query_notebook`
Performs a semantic search/RAG generation against the notebook. Performs a semantic search/RAG generation against the notebook.
- `notebook`: string - `notebook`: string
- `query`: string - `query`: string
## Configuration ## Configuration
Env vars: Env vars:
- `QDRANT_URL`: URL to Qdrant (e.g., `http://qdrant.openshift-gitops.svc:6333`) - `DATABASE_URL`: Postgres connection string (e.g., `postgresql://postgres:password@postgres.knowledge-mcp.svc:5432/knowledge`)
- `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`) - `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`)
## TODO
- Add PDF → Markdown/text conversion step to improve extraction quality.
- Add OCR pipeline for scanned PDFs.