archit@portfolio:~$ ls ./rag-from-scratch/
RAG From Scratch
Full retrieval-augmented generation pipeline built from first principles.
No LangChain. No LlamaIndex. Every algorithm implemented by hand.
archit@portfolio:~$ cat tech_stack.json
// ingestion.py → chunker.py → embeddings.py → vectorstore.py
// bm25.py → retriever.py → reranker.py → generator.py
// 87 tests · 9 components · every algorithm from scratch
archit@portfolio:~$ cat features.txt
→ recursive text chunking with configurable separators & overlap
→ sentence embeddings via all-MiniLM-L6-v2 · mean pooling + L2 norm
→ NumPy vector store · cosine sim = dot product on normalized vectors
→ Okapi BM25 from scratch · Robertson-Walker IDF · k1=1.5 b=0.75
→ hybrid retrieval · Reciprocal Rank Fusion (k=60)
→ cross-encoder reranking · ms-marco-MiniLM · raw logit scoring
→ raw HTTP generation · any OpenAI-compatible API · source attribution
→ evaluation suite · Precision@k · Recall@k · MRR · faithfulness
→ Streamlit UI · upload · index · ask · view chunks + answer
archit@portfolio:~$ █