Files
knowledge-mcp/README.md

47 lines
1.5 KiB
Markdown

# knowledge-mcp
A Model Context Protocol (MCP) server that provides scoped RAG workspaces ("Notebooks") backed by **Postgres + pgvector** and **TEI**.
## Overview
This server enables an agent to:
1. Create named "Notebooks" (Postgres-backed collections).
2. Ingest documents (PDF, Markdown, Text) into specific notebooks.
3. Query specific notebooks using vector search (RAG).
4. Synthesize findings across a notebook.
Designed to replicate the **NotebookLM** experience: clean, focused, bounded context.
## Stack
* **Language:** Python 3.11+
* **Framework:** `mcp` SDK
* **Vector DB:** Postgres + pgvector
* **Embeddings:** Text Embeddings Inference (TEI) - `BAAI/bge-base-en-v1.5`
## Tools
### `create_notebook`
Creates a new isolated workspace (Postgres-backed notebook).
- `name`: string (e.g., "project-alpha")
### `add_source`
Ingests a document into the notebook.
- `notebook`: string
- `content`: string (raw text or local path)
- `source_name`: string
- `format`: `text` or `pdf_path`
### `query_notebook`
Performs a semantic search/RAG generation against the notebook.
- `notebook`: string
- `query`: string
## Configuration
Env vars:
- `DATABASE_URL`: Postgres connection string (e.g., `postgresql://postgres:password@postgres.knowledge-mcp.svc:5432/knowledge`)
- `TEI_URL`: URL to TEI (e.g., `http://text-embeddings.tei.svc.cluster.local:8080`)
## TODO
- Add PDF → Markdown/text conversion step to improve extraction quality.
- Add OCR pipeline for scanned PDFs.