RAG From Scratch — Archit Konde

archit@portfolio: ~/rag-from-scratch

archit@portfolio:~$ ls ./rag-from-scratch/

RAG From Scratch

Full retrieval-augmented generation pipeline built from first principles.
No LangChain. No LlamaIndex. Every algorithm implemented by hand.

archit@portfolio:~$ cat tech_stack.json

python transformers numpy streamlit bm25 cosine-similarity cross-encoder pytest

// ingestion.py → chunker.py → embeddings.py → vectorstore.py
// bm25.py → retriever.py → reranker.py → generator.py
// 87 tests · 9 components · every algorithm from scratch

archit@portfolio:~$ cat features.txt

→ recursive text chunking with configurable separators & overlap

→ sentence embeddings via all-MiniLM-L6-v2 · mean pooling + L2 norm

→ NumPy vector store · cosine sim = dot product on normalized vectors

→ Okapi BM25 from scratch · Robertson-Walker IDF · k1=1.5 b=0.75

→ hybrid retrieval · Reciprocal Rank Fusion (k=60)

→ cross-encoder reranking · ms-marco-MiniLM · raw logit scoring

→ raw HTTP generation · any OpenAI-compatible API · source attribution

→ evaluation suite · Precision@k · Recall@k · MRR · faithfulness

→ Streamlit UI · upload · index · ask · view chunks + answer

archit@portfolio:~$ █

→ launch live demo