VR

Offline Notebook LM

Ask questions about your own documents and get answers, entirely offline. No cloud, no data leaving your machine.

PythonElectronReactFastAPIChromaDBsentence-transformers

Why I Built This

Most AI assistants need an internet connection and send your data to someone else's servers. That's a dealbreaker for anyone working with sensitive documents or in air-gapped environments. I wanted something where you could drop in your files, ask questions, and get answers grounded in your own documents — without anything leaving your machine.

How It Works

  • A routing agent classifies each query and decides whether to search at the summary level or chunk level across a 2-stage retrieval pipeline
  • LLM backend selection benchmarks models on domain-specific queries and uses knowledge distillation to compress larger model outputs into efficient Phi-3 and Mistral inference
  • Document ingestion supports 7+ file types with adaptive chunking, sentence-transformers embeddings, and ChromaDB vector storage

Built with Electron + React on the frontend, FastAPI + Python on the backend, backed by ChromaDB.

Results

  • 2-3x faster query times compared to naive retrieval, with lower memory overhead
  • ~366 chunks/second ingestion throughput
  • Everything runs locally. No cloud dependency for any feature.