Build a Mini RAG App in 2 Hours (LangChain + LangSmith + Chroma)

Hands-on RAG with LangChain, LangSmith, Chroma & OpenAI — ship a baseline in two hours.

Sep 21, 2025

TL;DR

🚀 Build a mini RAG app in 2 hours using LangChain, LangSmith, Chroma, and OpenAI.
📄 Workflow: Ingest PDFs → Chunk → Embed + Index → Retrieve → Generate with citations.
🎥 Watch the workshop replay.
📦 Download resources (notebook, slides, sample PDFs).
💬 Get support & Q&A in our Discord community.

Introduction

Large Language Models (LLMs) are powerful, but let’s face it — they’re not your knowledge base. Out of the box, they hallucinate facts, miss recent updates, and can’t cite sources. That’s where Retrieval-Augmented Generation (RAG)steps in: grounding model answers in your own documents for accuracy, recency, and compliance.

At ITVersity, we’ve been helping thousands of learners and professionals learn by building. As part of our 2026 roadmap, we’re running 50+ free hands-on workshops alongside structured paid cohorts in Data & AI. This article is a recap of one such workshop: Build a Mini RAG Application in 2 Hours.

👉 In this session, you’ll build a thin-slice RAG app using:

LangChain + LangSmith (for orchestration & tracing)
Chroma (vector database)
OpenAI Embeddings
ChatOpenAI (gpt-4o)

By the end, you’ll have a working app that can ingest PDFs, chunk text, embed + index in Chroma, retrieve with citations, and generate grounded answers.

Replay & Resources

🎥 Workshop Replay: YouTube
📦 Resources (notebook, slides, sample PDFs): Google Drive
💬 Support & Q&A: Join our Discord (dedicated support channel)

What You’ll Build

This workshop is all about a thin slice RAG app — just enough to work end-to-end without the heavy scaffolding. In ~2 hours, you’ll go from raw PDFs to a chatbot that can answer grounded questions with citations.

The flow looks like this:

📄 PDF documents → 🔑 Embeddings → 🗄️ Vector DB (Chroma) → 🎯 Retriever (k=5 / MMR) → 💬 LLM Answer [with sources]”

You’ll use the sample Toyota brochures provided in the resources folder:

Toyota Camry Specifications
Toyota Corolla Specifications
Toyota Prius Specifications
Toyota bZ4X Specifications

The app will:

Load PDFs with PyPDFLoader
Chunk text into 300-token windows with 50 overlap
Embed + index chunks in Chroma using OpenAIEmbeddings
Retrieve top chunks with flexible parameters (k, MMR)
Generate an answer with ChatOpenAI (gpt-4o) that cites sources (filename + page)
Optionally run a batch Q&A loop to validate multiple queries

This is your first success path: a working RAG baseline you can extend with reranking, hybrid search, dashboards, and deployment.

⚡ Pro tip: replace the sample Toyota PDFs with your own documents and you’ll instantly adapt this workflow to any domain — HR policies, financial reports, product manuals, or research papers.

Share ITVersity Publication

Using the Resources

To make this workshop as practical as possible, we’ve packaged everything you need: a Colab-ready notebook, the slides, and a sample dataset of Toyota brochures. Here’s how to get started:

📦 Download Resources: Google Drive Folder

1. Notebook

Building_Mini_RAG_App_using_LangChain_and_LangSmithv2.ipynb

Open the notebook directly in Google Colab (fastest way).
If running locally: clone/download → pip install -r requirements from the first cell.
Follow the cells in order:

Install packages (langchain, chroma, pypdf, etc.)
Set environment variables (OpenAI key, optional LangSmith keys).
Upload PDFs to /content/data/ if using Colab.
Run each step (ingestion → chunking → embeddings → retrieval → generation).

2. Sample PDFs (`/data/` folder)

Four Toyota brochures are included: Camry, Corolla, Prius, bZ4X.
Use them as-is to test the pipeline.
To customize: swap in your own PDFs, keep them in the /data/ directory, and rerun the ingestion step.

3. Slides

Workshop deck: step-by-step flow of the mini RAG app.
Intro deck: overview of ITVersity, cohorts, and workshop agenda.
Best used alongside the notebook to understand the reasoning behind each code cell.

⚡ Pro tip: Start with the provided Toyota PDFs to verify everything runs end-to-end. Once successful, replace them with your own documents — HR policies, research papers, manuals — to see RAG in action on data that matters to you.

Step-by-Step Walkthrough

Now that you have the notebook and sample PDFs, let’s go through the core building blocks of the mini RAG app. Each step below is already wired in the notebook — you just need to run the cells in order.

1. Ingest PDFs

We start by loading the Toyota brochures (or any PDFs you drop in the data/ folder).

from langchain_community.document_loaders import PyPDFLoader

pdf_paths = [
    "data/Toyota_Camry_Specifications.pdf",
    "data/Toyota_Corolla_Specifications.pdf",
    "data/Toyota_Prius_Specifications.pdf",
    "data/Toyota_bZ4X_Specifications.pdf",
]

docs = []
for p in pdf_paths:
    loader = PyPDFLoader(p)
    docs.extend(loader.load())   # per-page docs with metadata

Each PDF is split into page-level documents, with metadata like filename and page number preserved.

2. Chunk Documents

To make retrieval effective, we split documents into overlapping chunks.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50
)
chunks = splitter.split_documents(docs)

⚡ Why it matters: Smaller chunks → more precise retrieval; larger chunks → more context continuity. Start with 300/50and adjust based on your docs.

3. Embed & Index with Chroma

We generate embeddings and store them in a vector database for fast similarity search.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

emb = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, emb, collection_name="toyota_specs")

This creates a Chroma collection with embeddings for every chunk.

4. Retrieve Relevant Chunks

At query time, we fetch the top-k most relevant chunks.

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# try different settings
# retriever = vectorstore.as_retriever(
#     search_type="mmr",
#     search_kwargs={"k": 8, "fetch_k": 20}
# )

k=5 returns top 5 results.
search_type="mmr" increases diversity of retrieved chunks.

5. Generate Grounded Answers with Citations

Now we ask the LLM to answer using retrieved context only, and enforce citations.

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o", temperature=0)

prompt = PromptTemplate.from_template(
    "Answer the user question using ONLY the provided context.\n"
    "Cite sources inline like [source: filename p. page].\n"
    "If the answer is not in the context, say 'I don't know.'\n\n"
    "Question: {question}\n\nContext:\n{context}"
)

def format_context(docs):
    def src(md):
        fname = md.get("source","").split("/")[-1]
        page  = md.get("page", "NA")
        return f"[source: {fname} p. {page}]"
    return "\n\n".join(f"{d.page_content}\n{src(d.metadata)}" for d in docs)

rag_chain = (
    {"context": retriever | (lambda ds: format_context(ds)), "question": RunnablePassthrough()}
    | prompt
    | llm
)

answer = rag_chain.invoke("What is the battery capacity of the Toyota bZ4X?")
print(answer.content)

Result: a grounded, source-cited answer.

6. Batch Sanity Check

Run a small set of test questions to confirm quality.

questions = [
    "List two safety features of the Toyota Camry.",
    "What is the hybrid system output of the Prius?",
    "Compare the wheelbase of Corolla and Camry."
]

for q in questions:
    print("Q:", q)
    print(rag_chain.invoke(q).content, "\n---\n")

Batch testing helps catch weak retrieval or missing context early.

⚡ Pro tip: Swap in your own documents once you see the Toyota example working — you’ll instantly have a domain-specific RAG app.

Observability & Tuning

Building a thin-slice RAG app is just the start. To make it useful beyond a demo, you need observability and the ability to tune key parameters.

1. Enable Tracing with LangSmith

The notebook is pre-configured for LangSmith so you can track every step. Just set these environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGSMITH_API_KEY="ls-..."     # from your LangSmith account
export LANGSMITH_PROJECT="mini-rag-workshop"

With tracing on, you’ll see:

Which chunks were retrieved
How prompts were built
Cost/latency of each call

This makes it easier to debug and optimize.

2. Tuning Retrieval Parameters

A few knobs make a big difference:

chunk_size & chunk_overlap
Small chunks = more precise, less context
Large chunks = more coverage, but risk of noise
Start with 300/50 (as in this workshop).
k (number of retrieved chunks)
Default: 5.
Too low → incomplete answers.
Too high → model overwhelmed with irrelevant context.
Search type (similarity vs mmr)
similarity: top semantic matches.
mmr: promotes diversity, avoids redundancy.

3. Success Criteria (from the workshop)

Every answer cites sources ([source: file p. #]).
“I don’t know” returned when context is missing.
Batch Q&A produces sensible, grounded results.
Latency stays acceptable (under a few seconds per query).

4. What to Try Next

Experiment with different embedding models for trade-offs between speed and accuracy.
Add BM25 (keyword) retrieval alongside vectors for hybrid search.
Run a small golden eval set (10–20 hand-labeled Q&A) whenever you tweak retrieval or chunking.

⚡ Pro tip: Don’t change multiple knobs at once. Tune k, chunk size, or retriever type one at a time—and re-run your test questions.

Common Pitfalls (and Fixes)

Even with a working baseline, it’s easy to fall into traps that make your RAG app unreliable. Here are the biggest ones we saw in the workshop — and how to fix them.

1. Vector-Only Tunnel Vision

Problem: Semantic search alone struggles with IDs, acronyms, or exact matches.
Fix: Add a keyword/BM25 retriever and combine it with vector search (hybrid retrieval).

2. Bad Chunking Choices

Problem: Chunks that are too small lose context; chunks that are too large bring in noise.
Fix: Start with 300 tokens + 50 overlap. Adjust by doc type (smaller for structured PDFs, larger for prose).

3. No Citations in Answers

Problem: The model answers confidently but with no source references.
Fix: Use a prompt template that enforces citation formatting, and include metadata (filename + page) in retrieved chunks.

4. Demo-Only Success

Problem: Works on one doc, fails on others.
Fix: Build a tiny golden test set (10–20 Q&A pairs). Run it whenever you tweak retrieval, chunking, or prompts.

5. “Garbage In, Garbage Out”

Problem: Poor ingestion (e.g., messy OCR, missing metadata) leads to irrelevant chunks.
Fix: Clean input docs where possible, enrich metadata (titles, sections), and verify samples during ingestion.

6. Cost & Latency Spikes

Problem: Context windows get too big or repeated embeddings waste tokens.
Fix: Cache embeddings, batch operations, tune k, and use a retrieval-aware prompt that avoids dumping irrelevant text.

⚡ Pro tip: Most issues show up early if you trace with LangSmith and batch test with multiple questions. Make this a habit before you scale up.

How This Workshop Maps to the RAG/Agent Engineer Cohort

What you just built is a thin slice — enough to see RAG working end-to-end. In our RAG/Agent Engineer Cohort, we take this baseline and scale it into production-grade applications.

Weeks 1–3: Foundations (what you did here)

Thin-slice RAG: ingest → embed → index → retrieve → generate
Robust ingestion (OCR, semantic chunking, metadata enrichment)
Hybrid retrieval and parameter tuning (k, chunk size, MMR)
Early evaluation (precision@k, hit rate, golden test set)

Weeks 4–5: Orchestration & Reranking

Build multi-step pipelines with LangGraph
Introduce rerankers and ablation testing
Freeze sensible defaults for cost vs. quality

Weeks 6–7: Agents & Tools

Add tools like search APIs and calculators
Design multi-agent workflows with retries and error handling

Week 8: Evaluation Dashboards

Track groundedness, faithfulness, precision@k
Visualize latency, cost, and cache metrics
Integrate with tracing hooks

Week 9: Packaging & Deployment

Containerize apps with Docker
Deploy on cloud (GCP, OCI, or AWS)
CI/CD pipelines, TLS, backups

Week 10: Operations & Performance

Admin toggles, batching, re-rank switches
Load testing (p50/p95)
Vector DB tuning (IVF/HNSW)

Cohort Outcomes:

Two capstone projects: a production-ready RAG app + an agentic workflow
Portfolio repos you can showcase to employers/clients
Hands-on skills to move beyond demos and build real-world AI apps

📋 Apply / Join the waitlist: Google Form

Wrap-Up & Next Steps

In just a couple of hours, you’ve built a mini RAG app that can:

Ingest PDFs
Chunk and embed text
Store and retrieve chunks with Chroma
Generate grounded answers with citations using GPT-4o

That thin slice alone is enough to adapt to your own documents and domains — HR policies, financial reports, product manuals, research papers, and more.

But this is only the start. With more advanced techniques like hybrid retrieval, reranking, LangGraph orchestration, evaluation dashboards, and deployment pipelines, you can take RAG from a workshop demo to a production-ready system.

That’s exactly what we cover in the RAG/Agent Engineer Cohort.

📌 Key Links

🎥 Replay of the workshop: YouTube
📦 Resources (notebook, slides, sample PDFs): Google Drive
💬 Get support & updates: Discord Community • Free Workshops Support Channel
📋 Apply / Join the Cohort waitlist: Google Form

Final Thought

RAG is one of the most practical applications of LLMs today — bridging the gap between raw model power and your organization’s knowledge. If you’ve followed along, you now have the foundations to build on.

👉 Join our Discord for support, share what you’ve built, and be part of the growing ITVersity community. And if you’re ready to go deeper, apply for the upcoming cohort.

ITVersity Publication