



RAG-BASED AI CHAT APP | 2026
NextJS Doc Helper
RAG Documentation Helper is an AI assistant built to answer user questions about Next.js by querying pre-scraped, embedded documentation stored in a vector database. It uses a RAG + ReAct loop, giving the agent the flexibility to retrieve context only when needed. Source attribution is surfaced directly in the chat, and the interface is built with Streamlit.
OVERVIEW
IMPLEMENTATION
TECH STACK
Language
Python
AI
OpenAI
Framework
LangChain
VectorDB
Pinecone
Search
Tavily
Ui
Streamlit
Deployment
Streamlit Community Cloud
FEATURES
Chat interface for querying Next.js documentation
RAG + ReAct loop for context-aware, on-demand retrieval
Pre-scraped Next.js docs embedded and stored in Pinecone vector database
Source attribution displayed per AI response
Configurable hyperparameters for data scraping and storage
CHALLENGES & SOLUTIONS
Scraping and embedding large volumes of documentation efficiently
Used asyncio to parallelise scraping and embedding tasks, significantly reducing ingestion time compared to sequential processing. Hyperparameters such as chunk size and overlap were exposed as configurable values to give fine-grained control over how documentation is split and stored in Pinecone.
Avoiding unnecessary retrieval on every agent query
Implemented a RAG + ReAct loop where retrieval is a tool at the agent's disposal rather than a fixed step in every query. The agent reasons about whether fetching from the vector store is necessary before doing so, reducing redundant lookups and keeping responses efficient.
Surfacing source attribution from LangChain's response format to the frontend
Used LangChain's response_format="content_and_artifact" to separate the AI's answer from its source metadata. The artifact portion, containing the retrieved document sources, was extracted and passed to the Streamlit frontend for display alongside each response.
Configuring and tuning vector storage for accurate retrieval
Tuned Pinecone retrieval by adjusting chunk size, overlap, and the number of returned documents (top-k) to balance retrieval precision against context window usage, ensuring the agent receives relevant excerpts without being overwhelmed by noise.