Curriculum/Day 3: RAG Deep Dive

Day 3Build AI Products

RAG Deep Dive

RAG is the #1 AI pattern in production. You'll go far beyond basics — learn multiple chunking strategies, debug retrieval failures, handle the #1 cause of hallucination (bad retrieval), and build a Q&A system that cites its sources. This is the day that separates AI engineers from tutorial followers.

90 min(+35 min boss)★★★☆☆

📚

Bridge:Database queriesRetrieval + AI generation

Use this at work tomorrow

Build a Q&A bot over your team's internal docs — Confluence, Notion, or README files.

Learning Objectives

1Master the RAG pipeline: chunk → embed → store → retrieve → generate
2Compare chunking strategies: fixed-size, recursive, semantic, by-heading
3Debug RAG failures: bad retrieval, context overflow, hallucination grounding
4Add source citations with [1], [2] notation for trustworthy answers
5Ship a document Q&A system that answers from YOUR data

Ship It: Document Q&A system

By the end of this day, you'll build and deploy a document q&a system. This isn't a toy — it's a real project for your portfolio.

Before You Start — Rate Your Confidence

I can build a RAG pipeline that chunks documents, embeds them, retrieves relevant context, and generates grounded answers with citations.

1 = no idea · 5 = ship it blindfolded

Predict First — Then Learn

How does RAG reduce LLM hallucinations?

RAG = Query Your Own Data with AI

RAG stands for Retrieval-Augmented Generation. Think of it as: query your database, but instead of rendering the data directly in a template, you pass it to an LLM as context to generate a natural language answer. It's the #1 AI pattern in production because it grounds LLM responses in YOUR data, dramatically reducing hallucination.

💡RAG = query your data + pass results as context to an LLM. The #1 pattern in production AI. Grounds answers in YOUR data.

Quick Pulse Check

What does the 'R' in RAG do?

🔄 RAG Pipeline — 5 Steps from Question to Answer

→

DATA FLOWING THROUGH PIPELINE

📄 “How does useEffect work?”→[0.3, 0.9, 0.1, 0.7]→3 chunks (0.95, 0.87, 0.72)→“useEffect runs after render...”

The RAG Pipeline: Chunk → Embed → Store → Retrieve → Generate

The RAG pipeline is a 5-step data flow. (1) Chunk: split documents into manageable pieces. (2) Embed: convert chunks to vectors. (3) Store: save vectors in a vector database. (4) Retrieve: find chunks similar to the user's query. (5) Generate: pass retrieved chunks as context to an LLM. Each step has trade-offs — chunk size affects retrieval quality, embedding model affects accuracy, and the generation prompt affects answer quality.

💡5 steps: Chunk → Embed → Store → Retrieve → Generate. Each step has trade-offs that affect final answer quality.

Quick Pulse Check

In the RAG pipeline, which step happens at query time (not during indexing)?

Predict First — Then Learn

You split a 10,000-word document every 500 characters. What's the biggest problem?

Chunking Strategies: Fixed-Size Is Just the Beginning

Fixed-size (500 chars) is the simplest but often worst strategy — it splits mid-sentence. Recursive splitting follows document structure (paragraphs → sentences → words). Semantic chunking groups by meaning. Heading-based chunking follows document hierarchy. For code: chunk by function/class. The right strategy depends on your data. Bad chunking is the #1 cause of bad RAG.

💡Bad chunking = bad RAG. Recursive splitting respects structure. Fixed-size splits mid-sentence. Always check your chunk boundaries.

Quick Pulse Check

You're chunking API documentation. What's the best strategy?

✂️ Chunking Strategies — See the Difference

Same text, four chunking strategies. See how each splits the document.

Chunk size:200 chars

Chunks: 5

Avg size: 180 chars

Total: 898 chars

Chunk 1200 chars

React is a JavaScript library for building user interfaces. It was created by Jordan Walke, a software engineer at Facebook. React uses a virtual DOM to efficiently update the real DOM. When state ch

⚠️ Split mid-word!

Chunk 2200 chars

anges, React calculates the minimal set of changes needed and applies them in a batch. Components are the building blocks of React applications. Each component manages its own state and renders a pie

⚠️ Split mid-word!

Chunk 3200 chars

ce of the UI. Components can be composed together to build complex interfaces. The useEffect hook runs side effects after rendering. By default it runs after every render. Pass a dependency array to

Chunk 4200 chars

control when it fires. An empty array means it runs once on mount. React Server Components (RSC) are a new paradigm that let components render on the server. They reduce bundle size because server co

⚠️ Split mid-word!

Chunk 598 chars

mponents never ship JavaScript to the client. They can directly access databases and file systems.

Predict First — Then Learn

Your RAG app gives bad answers. What should you debug FIRST?

RAG Failure Modes: What Goes Wrong in Production

Garbage retrieval → hallucinated answers. If the retriever pulls irrelevant chunks, the LLM will still generate a confident answer from nonsense context. Other failure modes: context overflow (too many chunks), lost-in-the-middle (LLMs ignore middle chunks), stale data (embeddings from old docs), and adversarial queries that retrieve unrelated content. Debugging RAG means debugging retrieval first.

💡If retrieval is bad, no prompt can fix generation. Debug retrieval first. LLMs will confidently hallucinate from garbage context.

Quick Pulse Check

Your RAG system retrieves chunks about 'cooking recipes' for a question about 'React hooks'. What happens?

🔍 Retrieval Debugger — See Why RAG Answers Go Wrong

Try different queries. Watch how retrieval quality directly affects the answer.

Retrieved Chunks (top 3)

[1] AI100%

RAG combines retrieval and generation. Find relevant documents, then pass them as context to an LLM.

[2] AI100%

Vector databases store embeddings for fast similarity search using HNSW or IVF indexes.

[3] ML100%

Fine-tuning adapts a pre-trained model to your specific domain by training on labeled examples.

All chunks by similarity:

Generated Answer

The Full Evolution

Watch one function evolve through every concept you just learned.

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

The SWE starting point

Raw fetch, manual headers, raw text output

1async function reviewCode(code: string) {
2  const response = await fetch(
3    "https://api.openai.com/v1/chat/completions",
4    {
5      method: "POST",
6      headers: {
7        "Authorization": `Bearer ${API_KEY}`,
8        "Content-Type": "application/json",
9      },
10      body: JSON.stringify({
11        model: "gpt-4o-mini",
12        messages: [
13          { role: "user", content: `Review: ${code}` }
14        ],
15      }),
16    }
17  );
18  const data = await response.json();
19  return data.choices[0].message.content;
20  // Returns raw text — unparseable!
21}

1 / 5

Production Gotchas

Chunk overlap prevents losing context at boundaries (50-100 char overlap is standard). Always include document metadata (source, date, section) in your chunks — you'll need it for citations and filtering. Monitor retrieval quality separately from generation quality — if retrieval is bad, no prompt can fix generation. Re-rank after retrieval for better quality (reorders by actual relevance, not just vector similarity).

Code Comparison

Data Query: SQL + Template vs RAG

Traditional data display vs RAG-powered answers

Query + TemplateTraditional

// Traditional: query DB, render template
const docs = await db.query(
  "SELECT * FROM docs WHERE topic = $1",
  [userQuestion]
);

return docs.map(doc => ({
  title: doc.title,
  snippet: doc.content.slice(0, 200),
  link: doc.url,
}));
// Returns: list of links
// User must read & synthesize themselves

RAG PipelineAI Engineering

// RAG: retrieve context, generate answer
// 1. Embed user's question
const { embedding } = await embed({
  model: openai.embedding(
    "text-embedding-3-small"
  ),
  value: userQuestion,
});

// 2. Retrieve relevant chunks
const chunks = await vectorDB.query({
  vector: embedding, topK: 5,
});

// 3. Generate grounded answer
const { text } = await generateText({
  model: openai("gpt-4o-mini"),
  system: `Answer based ONLY on the context.
If the answer isn't there, say "I don't know."`,
  prompt: `Context:
${chunks.map(c => c.content).join("\n\n")}

Question: ${userQuestion}`,
});
// Returns: synthesized answer with context

KEY DIFFERENCES

Traditional: user searches → reads → synthesizes answer manually
RAG: user asks → system retrieves → LLM synthesizes → user gets answer
RAG pipeline: Chunk → Embed → Store → Retrieve → Generate
The 'only answer from context' prompt prevents hallucination

Chunking: Fixed vs Recursive

Why chunking strategy matters for RAG quality

Fixed-Size ChunkingTraditional

// Fixed-size: simple but crude
function chunkFixed(text: string, size = 500) {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += size) {
    chunks.push(text.slice(i, i + size));
  }
  return chunks;
}

// Problem: "The React useEffect hook
// runs after every-"
// [CHUNK BOUNDARY]
// "-render by default."
// Splits mid-sentence! Context lost.

Recursive ChunkingAI Engineering

// Recursive: follows document structure
function chunkRecursive(
  text: string,
  maxSize = 500
) {
  // Split by paragraphs first
  const paragraphs = text.split("\n\n");

  const chunks: string[] = [];
  let current = "";

  for (const para of paragraphs) {
    if ((current + para).length > maxSize) {
      if (current) chunks.push(current.trim());
      current = para;
    } else {
      current += "\n\n" + para;
    }
  }
  if (current) chunks.push(current.trim());
  return chunks;
}
// Respects paragraph boundaries!
// Each chunk is a complete thought.

KEY DIFFERENCES

Fixed-size is easy to implement but splits mid-sentence
Recursive follows document structure (paragraphs → sentences)
Bad chunking = bad retrieval = hallucinated answers
Always test your chunking on real docs — look at the boundaries

Bridge Map: Database queries → Retrieval + AI generation

Click any bridge to see the translation

Hands-On Challenges

Build, experiment, and get AI-powered feedback on your code.

starter

Build a Simple RAG Pipeline

Complete the RAG function that: 1) finds the most relevant document chunks for a query, 2) builds a prompt with the retrieved context, 3) returns the formatted prompt. Use the provided helper functions. The key insight: the prompt structure determines whether the LLM hallucinates or gives grounded answers.

PLAYGROUND

import { useState } from "react";

// Simulated knowledge base (pre-chunked documents)
const knowledgeBase = [
  { id: "1", text: "React useEffect runs after every render by default. Add a dependency array to control when it runs. An empty array means it runs once on mount.", embedding: [0.3, 0.9, 0.1, 0.7] },
  { id: "2", text: "Transformers use self-attention to process all tokens in parallel, unlike RNNs which process sequentially. This makes them much faster to train on GPUs.", embedding: [0.9, 0.1, 0.8, 0.2] },
  { id: "3", text: "RAG combines retrieval and generation. First, find relevant documents using embeddings. Then, pass those documents as context to an LLM to generate grounded answers.", embedding: [0.85, 0.2, 0.7, 0.3] },
  { id: "4", text: "CSS Flexbox is a one-dimensional layout method. Use flex-direction for row/column, justify-content for main axis, align-items for cross axis.", embedding: [0.1, 0.8, 0.2, 0.9] },
  { id: "5", text: "Prompt engineering involves crafting inputs to get desired outputs from LLMs. Key techniques: few-shot examples, chain-of-thought, and system prompts.", embedding: [0.8, 0.15, 0.75, 0.25] },
  { id: "6", text: "Vector databases like Pinecone and Weaviate store embeddings for fast similarity search. They use HNSW indexes to find approximate nearest neighbors in milliseconds.", embedding: [0.82, 0.18, 0.72, 0.28] },
];

// Helper: cosine similarity
function cosineSim(a: number[], b: number[]): number {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
  const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
  return dot / (magA * magB);
}

// TODO: Implement the RAG pipeline
function ragQuery(query: string, queryEmbedding: number[], topK: number = 3): string {
  // Step 1: Retrieve — find the topK most similar chunks
  // Hint: map over knowledgeBase, compute similarity, sort, slice

  // Step 2: Build prompt — combine retrieved chunks into context
  // Format: numbered chunks like "[1] chunk text..."

  // Step 3: Return the complete prompt that instructs the LLM to:
  //   - Only answer from the provided context
  //   - Say "I don't know" if the answer isn't in the context
  //   - Include the user's question

  return "Implement me!";
}

export default function App() {
  const [answer, setAnswer] = useState("");
  const [selectedQuery, setSelectedQuery] = useState(0);

  const queries = [
    { text: "How does RAG work?", embedding: [0.82, 0.18, 0.72, 0.28] },
    { text: "What is useEffect?", embedding: [0.28, 0.88, 0.12, 0.72] },
    { text: "How do transformers process data?", embedding: [0.88, 0.12, 0.78, 0.22] },
  ];

  function handleQuery() {
    const q = queries[selectedQuery];
    const result = ragQuery(q.text, q.embedding);
    setAnswer(result);
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>📚 RAG Pipeline</h2>
      <div style={{ margin: "8px 0" }}>
        {queries.map((q, i) => (
          <button key={i} onClick={() => { setSelectedQuery(i); setAnswer(""); }}
            style={{
              display: "block", width: "100%", margin: "4px 0", padding: "6px 12px",
              background: selectedQuery === i ? "#e0f2fe" : "#f8fafc",
              border: selectedQuery === i ? "2px solid #0ea5e9" : "1px solid #e2e8f0",
              borderRadius: 6, cursor: "pointer", textAlign: "left", fontSize: 13,
            }}>
            {q.text}
          </button>
        ))}
      </div>
      <button onClick={handleQuery}
        style={{ padding: "8px 20px", background: "#0ea5e9", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}>
        Run RAG Pipeline
      </button>
      {answer && (
        <div style={{ marginTop: 16, padding: 12, background: "#f8fafc", borderRadius: 8, border: "1px solid #e2e8f0", whiteSpace: "pre-wrap", fontSize: 13 }}>
          <strong>Generated Prompt (sent to LLM):</strong>
          <pre style={{ marginTop: 8, background: "#f1f5f9", padding: 10, borderRadius: 6, fontSize: 12, overflow: "auto" }}>
            {answer}
          </pre>
        </div>
      )}
    </div>
  );
}

Open Sandbox

stretch

RAG with Source Citations

Extend the RAG pipeline to include source citations. Each chunk should be numbered, and the generated prompt should instruct the LLM to cite sources using [1], [2] notation. Return both the prompt AND a structured sources list for the UI to display.

PLAYGROUND

import { useState } from "react";

interface Chunk {
  id: string;
  text: string;
  source: string;
  page?: number;
  embedding: number[];
}

const knowledgeBase: Chunk[] = [
  { id: "1", text: "Fine-tuning adapts a pre-trained model to your specific domain by training on your data. It requires 100-1000+ labeled examples.", source: "AI Handbook Ch.4", page: 42, embedding: [0.9, 0.2, 0.8, 0.1] },
  { id: "2", text: "LoRA (Low-Rank Adaptation) reduces fine-tuning cost by only training small adapter matrices instead of the full model. This cuts GPU memory by 10x.", source: "LoRA Paper (Hu et al. 2021)", page: 3, embedding: [0.85, 0.25, 0.75, 0.15] },
  { id: "3", text: "Transfer learning uses knowledge from one task to improve performance on a related task. Pre-trained LLMs are the ultimate transfer learning.", source: "ML Fundamentals Ch.8", page: 156, embedding: [0.8, 0.3, 0.7, 0.2] },
  { id: "4", text: "Docker containers package applications with their dependencies for consistent deployment across environments.", source: "DevOps Guide Ch.2", page: 28, embedding: [0.1, 0.9, 0.2, 0.8] },
  { id: "5", text: "Prompt engineering is often sufficient for customization. Try few-shot examples before investing in fine-tuning.", source: "AI Handbook Ch.3", page: 31, embedding: [0.78, 0.22, 0.72, 0.28] },
];

function cosineSim(a: number[], b: number[]): number {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
  const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
  return dot / (magA * magB);
}

interface RAGResult {
  prompt: string;
  sources: { num: number; source: string; page?: number; text: string; score: number }[];
}

// TODO: Build RAG with citations
// 1. Retrieve relevant chunks (top-k by similarity, threshold > 0.5)
// 2. Number them as [1], [2], etc.
// 3. Build a prompt with numbered context and citation instruction
// 4. Return both the prompt AND the source list
function ragWithCitations(queryEmbedding: number[], topK: number = 3): RAGResult {
  return { prompt: "Implement me!", sources: [] };
}

export default function App() {
  const [result, setResult] = useState<RAGResult | null>(null);

  function handleQuery() {
    setResult(ragWithCitations([0.87, 0.22, 0.78, 0.12], 3));
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>📚 RAG with Source Citations</h2>
      <p style={{ color: "#666", fontSize: 13 }}>Query: "How can I customize an LLM for my domain?"</p>
      <button onClick={handleQuery}
        style={{ padding: "8px 20px", background: "#8b5cf6", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}>
        Run RAG Pipeline
      </button>
      {result && result.sources.length > 0 && (
        <div style={{ marginTop: 16 }}>
          <h3 style={{ fontSize: 14, marginBottom: 8 }}>Generated Prompt:</h3>
          <pre style={{ background: "#f1f5f9", padding: 12, borderRadius: 8, whiteSpace: "pre-wrap", fontSize: 12 }}>
            {result.prompt}
          </pre>
          <h3 style={{ fontSize: 14, margin: "12px 0 8px" }}>Sources:</h3>
          <ul style={{ paddingLeft: 20, fontSize: 13 }}>
            {result.sources.map((s) => (
              <li key={s.num} style={{ margin: "4px 0" }}>
                <strong>[{s.num}]</strong> {s.source}{s.page ? " (p." + s.page + ")" : ""} 
                <span style={{ color: "#64748b" }}> — {(s.score * 100).toFixed(0)}% match</span>
              </li>
            ))}
          </ul>
        </div>
      )}
    </div>
  );
}

Open Sandbox

Real-World Challenge

Document Q&A System

Build and deploy a RAG-powered document Q&A system that lets users upload documents, ask questions in natural language, and get accurate answers with source citations. This is the #1 AI pattern in production — build it for real.

~4h estimated

Next.js 14+Vercel AI SDKOpenAI GPT-4o-mini + text-embedding-3-smallTailwind CSSVercel (deploy)

Acceptance Criteria

Accept document uploads (text, markdown, or PDF) and chunk them intelligently
Generate and store embeddings for all document chunks
Retrieve the most relevant chunks for a user's question using vector similarity
Generate answers grounded in the retrieved context with [1], [2] source citations
Handle 'I don't know' gracefully when the answer isn't in the documents
Support multiple documents with source attribution
Deploy to a public URL (Vercel, Netlify, etc.)

Build Roadmap

0/6

Create a new Next.js app with TypeScript and Tailwind CSS. Set up the project with a document upload page and API routes for processing and querying.

npx create-next-app@latest doc-qa --typescript --tailwind --app

Plan three API routes: /api/upload, /api/embed, /api/ask

Deploy Tip

Push to GitHub and import into Vercel. For the demo, pre-load a few sample documents so reviewers can try it immediately without uploading. Set your OPENAI_API_KEY in Vercel environment variables.

After Learning — Rate Your Confidence Again

I can build a RAG pipeline that chunks documents, embeds them, retrieves relevant context, and generates grounded answers with citations.

1 = no idea · 5 = ship it blindfolded

Day 2: Embeddings & Vector Search

Day 4: Function Calling & Tool Chains

RAG Deep Dive

Learning Objectives

Ship It: Document Q&A system

RAG = Query Your Own Data with AI

🔄 RAG Pipeline — 5 Steps from Question to Answer

The RAG Pipeline: Chunk → Embed → Store → Retrieve → Generate

Chunking Strategies: Fixed-Size Is Just the Beginning

✂️ Chunking Strategies — See the Difference

RAG Failure Modes: What Goes Wrong in Production

🔍 Retrieval Debugger — See Why RAG Answers Go Wrong

The Full Evolution

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Production Gotchas

Code Comparison

Data Query: SQL + Template vs RAG

Chunking: Fixed vs Recursive

Bridge Map: Database queries → Retrieval + AI generation

Hands-On Challenges

Build a Simple RAG Pipeline

RAG with Source Citations

Document Q&A System

Acceptance Criteria

Build Roadmap

Discussion

🔄 RAG Pipeline — 5 Steps from Question to Answer

✂️ Chunking Strategies — See the Difference

🔍 Retrieval Debugger — See Why RAG Answers Go Wrong

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Build a Simple RAG Pipeline

RAG with Source Citations

Discussion