Offline Notebook LM
Offline-first RAG assistant with vector intelligence and LLM orchestration for local document querying.
PythonElectronReactFastAPIChromaDBsentence-transformers
View on GitHub ↗Problem Statement
Cloud-dependent AI assistants can't be used in air-gapped or privacy-sensitive environments. The challenge: build an offline-first RAG application that supports local document ingestion, embedding generation, and LLM-powered retrieval without any cloud dependency.
Technical Approach
- Agentic retrieval workflow where a routing agent classifies query intent to invoke either summary-level or chunk-level search across a 2-stage pipeline
- Intelligent LLM backend selection that benchmarks model performance on domain-specific queries and applies knowledge distillation to compress larger model outputs into efficient Phi-3 and Mistral inference pipelines
- Multi-format document ingestion supporting 7+ file types with adaptive chunking strategies, sentence-transformers embeddings, and ChromaDB vector storage
Built with Electron + React + FastAPI + Python, backed by a PGVector-compatible ChromaDB vector store.
Results
- 2-3x faster query times with reduced memory overhead versus naive retrieval
- ~366 chunks/second ingestion throughput with consistent retrieval accuracy
- Fully offline — no cloud dependency required for any feature