The feature image for the project NextJS Doc Helper.

RAG-BASED AI CHAT APP | 2026

NextJS Doc Helper

RAG Documentation Helper is an AI assistant built to answer user questions about Next.js by querying pre-scraped, embedded documentation stored in a vector database. It uses a RAG + ReAct loop, giving the agent the flexibility to retrieve context only when needed. Source attribution is surfaced directly in the chat, and the interface is built with Streamlit.

Python

LangChain

Streamlit

Tavily

OpenAI API

Pinecone

OVERVIEW

RAG Documentation Helper is an AI-powered chat assistant for answering questions about Next.js. Instead of relying on general training knowledge, it queries pre-scraped Next.js documentation embedded and stored in a Pinecone vector database, grounding every answer in accurate source material. The agent uses a RAG + ReAct loop, deciding intelligently when retrieval is actually needed rather than fetching context on every query. Every response also surfaces its source attribution directly in the chat, giving users full transparency into where the information comes from. The interface is built with Streamlit.

IMPLEMENTATION

TECH STACK

Language

Python

OpenAI

Framework

LangChain

VectorDB

Pinecone

Tavily

Streamlit

Deployment

Streamlit Community Cloud

FEATURES

Chat interface for querying Next.js documentation

RAG + ReAct loop for context-aware, on-demand retrieval

Pre-scraped Next.js docs embedded and stored in Pinecone vector database

Source attribution displayed per AI response

Configurable hyperparameters for data scraping and storage

CHALLENGES & SOLUTIONS

Scraping and embedding large volumes of documentation efficiently

Used asyncio to parallelise scraping and embedding tasks, significantly reducing ingestion time compared to sequential processing. Hyperparameters such as chunk size and overlap were exposed as configurable values to give fine-grained control over how documentation is split and stored in Pinecone.

Avoiding unnecessary retrieval on every agent query

Implemented a RAG + ReAct loop where retrieval is a tool at the agent's disposal rather than a fixed step in every query. The agent reasons about whether fetching from the vector store is necessary before doing so, reducing redundant lookups and keeping responses efficient.

Surfacing source attribution from LangChain's response format to the frontend

Used LangChain's response_format="content_and_artifact" to separate the AI's answer from its source metadata. The artifact portion, containing the retrieved document sources, was extracted and passed to the Streamlit frontend for display alongside each response.

Configuring and tuning vector storage for accurate retrieval

Tuned Pinecone retrieval by adjusting chunk size, overlap, and the number of returned documents (top-k) to balance retrieval precision against context window usage, ensuring the agent receives relevant excerpts without being overwhelmed by noise.