docs: update README for Postgres+pgvector and add ingestion TODOs
This commit is contained in:
24
README.md
24
README.md
@@ -1,11 +1,11 @@
|
|||||||
# knowledge-mcp
|
# knowledge-mcp
|
||||||
|
|
||||||
A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Qdrant** and **TEI**.
|
A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Postgres + pgvector** and **TEI**.
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
This server enables an agent to:
|
This server enables an agent to:
|
||||||
1. Create named "Notebooks" (Qdrant Collections).
|
1. Create named "Notebooks" (Postgres-backed collections).
|
||||||
2. Ingest documents (PDF, Markdown, Text) into specific notebooks.
|
2. Ingest documents (PDF, Markdown, Text) into specific notebooks.
|
||||||
3. Query specific notebooks using vector search (RAG).
|
3. Query specific notebooks using vector search (RAG).
|
||||||
4. Synthesize findings across a notebook.
|
4. Synthesize findings across a notebook.
|
||||||
@@ -15,26 +15,32 @@ Designed to replicate the **NotebookLM** experience: clean, focused, bounded con
|
|||||||
## Stack
|
## Stack
|
||||||
* **Language:** Python 3.11+
|
* **Language:** Python 3.11+
|
||||||
* **Framework:** `mcp` SDK
|
* **Framework:** `mcp` SDK
|
||||||
* **Vector DB:** Qdrant
|
* **Vector DB:** Postgres + pgvector
|
||||||
* **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5`
|
* **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5`
|
||||||
|
|
||||||
## Tools
|
## Tools
|
||||||
|
|
||||||
### `notebook.create`
|
### `create_notebook`
|
||||||
Creates a new isolated workspace (Qdrant Collection).
|
Creates a new isolated workspace (Postgres-backed notebook).
|
||||||
- `name`: string (e.g., "project-alpha")
|
- `name`: string (e.g., "project-alpha")
|
||||||
|
|
||||||
### `notebook.add_source`
|
### `add_source`
|
||||||
Ingests a document into the notebook.
|
Ingests a document into the notebook.
|
||||||
- `notebook`: string
|
- `notebook`: string
|
||||||
- `url`: string (URL or local path)
|
- `content`: string (raw text or local path)
|
||||||
|
- `source_name`: string
|
||||||
|
- `format`: `text` or `pdf_path`
|
||||||
|
|
||||||
### `notebook.query`
|
### `query_notebook`
|
||||||
Performs a semantic search/RAG generation against the notebook.
|
Performs a semantic search/RAG generation against the notebook.
|
||||||
- `notebook`: string
|
- `notebook`: string
|
||||||
- `query`: string
|
- `query`: string
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
Env vars:
|
Env vars:
|
||||||
- `QDRANT_URL`: URL to Qdrant (e.g., `http://qdrant.openshift-gitops.svc:6333`)
|
- `DATABASE_URL`: Postgres connection string (e.g., `postgresql://postgres:password@postgres.knowledge-mcp.svc:5432/knowledge`)
|
||||||
- `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`)
|
- `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`)
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
- Add PDF → Markdown/text conversion step to improve extraction quality.
|
||||||
|
- Add OCR pipeline for scanned PDFs.
|
||||||
|
|||||||
Reference in New Issue
Block a user