Offline Notebook LM
Ask questions about your own documents and get answers, entirely offline. No cloud, no data leaving your machine.
PythonElectronReactFastAPIChromaDBsentence-transformers
Why I Built This
Most AI assistants need an internet connection and send your data to someone else's servers. That's a dealbreaker for anyone working with sensitive documents or in air-gapped environments. I wanted something where you could drop in your files, ask questions, and get answers grounded in your own documents — without anything leaving your machine.
How It Works
- A routing agent classifies each query and decides whether to search at the summary level or chunk level across a 2-stage retrieval pipeline
- LLM backend selection benchmarks models on domain-specific queries and uses knowledge distillation to compress larger model outputs into efficient Phi-3 and Mistral inference
- Document ingestion supports 7+ file types with adaptive chunking, sentence-transformers embeddings, and ChromaDB vector storage
Built with Electron + React on the frontend, FastAPI + Python on the backend, backed by ChromaDB.
Results
- 2-3x faster query times compared to naive retrieval, with lower memory overhead
- ~366 chunks/second ingestion throughput
- Everything runs locally. No cloud dependency for any feature.