Curriculum/Day 2: Embeddings & Vector Search

Day 2Build AI Products

Embeddings & Vector Search

SQL LIKE matches characters. Vector search matches meaning. You'll learn how embeddings turn text into numbers, build similarity search from scratch, and understand when to use Pinecone, pgvector, or Chroma. You'll ship a search engine that finds 'cozy sneakers' when users type 'comfortable shoes'.

70 min(+25 min boss)★★☆☆☆

🔍

Bridge:SQL LIKE / full-text searchVector similarity search

Use this at work tomorrow

Replace a keyword-based search in your app with semantic search — users will find what they mean, not just what they type.

Learning Objectives

1Understand embeddings as feature vectors (1536 dimensions → meaning)
2Implement cosine similarity from scratch — then use it at scale
3Generate embeddings with OpenAI's embedding API
4Know when to pick Pinecone vs pgvector vs Chroma vs in-memory
5Build a semantic search engine with 100+ real documents

Ship It: Semantic search engine

By the end of this day, you'll build and deploy a semantic search engine. This isn't a toy — it's a real project for your portfolio.

Before You Start — Rate Your Confidence

I can explain what embeddings are, compute cosine similarity, and build a semantic search system that matches meaning instead of keywords.

1 = no idea · 5 = ship it blindfolded

Predict First — Then Learn

How does semantic search find 'cozy sneakers' when you search 'comfortable shoes'?

From Text Search to Semantic Search

SQL LIKE queries match exact strings — 'comfortable shoes' will never match 'cozy sneakers'. Semantic search matches meaning. The magic is embeddings — turning text into vectors (arrays of 1536 numbers) where similar meanings are close together in vector space. It's like GPS coordinates for meaning.

💡Embeddings = GPS coordinates for meaning. Similar text → nearby vectors. No synonym tables needed.

Quick Pulse Check

What does an embedding convert text into?

🗺️ Vector Space — Words as Coordinates

royaltyanimalstechfood

Click or hover a word to see its nearest neighbors and cosine similarity scores.

2D projection of word embeddings. Similar meanings cluster together — “king” is close to “queen”, not “pizza”.

Predict First — Then Learn

What do the 1536 numbers in an embedding vector represent?

Embeddings Are Feature Vectors You Already Understand

You've worked with feature flags, analytics dimensions, or coordinate systems. Embeddings are the same concept at scale — each of the 1536 dimensions captures a semantic feature learned by the model. 'King' and 'Queen' are close in the royalty dimension but differ on the gender dimension. You don't pick the features — the model learns them.

💡Each of the 1536 dimensions is a learned semantic feature. You don't define them — the model discovers them.

Quick Pulse Check

Who decides what each dimension in an embedding means?

🔬 Embedding Dimensions — What the Model Learns

Real embeddings have 1536 dimensions. Here are ~10 interpretable ones to show the concept.

king

0.95

queen

0.93

prince

0.85

dog

0.02

python

0.01

pizza

0.01

Key insight: "King" and "queen" score high on royalty. "Pizza" doesn't. This is how embeddings capture meaning.

Predict First — Then Learn

When should you use a dedicated vector database like Pinecone instead of in-memory arrays?

Real Vector Databases: When to Use What

For prototyping: in-memory arrays with cosine similarity (what we do today). For production with <100K docs: pgvector (add vectors to your existing Postgres). For production at scale: Pinecone (managed, fast), Weaviate (open-source, hybrid search), or Chroma (lightweight, Python-first). The interface is always the same: store vectors, query by similarity, get top-K results.

💡Prototype with in-memory arrays. Scale with pgvector (Postgres) or Pinecone. Interface is always: store → query → top-K.

Quick Pulse Check

You have a Postgres database and 50K product descriptions. What's the EASIEST way to add semantic search?

📐 Cosine Similarity — Measuring Meaning Distance

Sentence A

“The cat sat on the mat”

Sentence B

“A kitten rested on the rug”

99.9%

Very Similar

0 (unrelated)1 (identical)

Cosine similarity measures the angle between two vectors. Smaller angle = higher similarity. Same meaning ≈ 0.90+.

The Full Evolution

Watch one function evolve through every concept you just learned.

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

The SWE starting point

Raw fetch, manual headers, raw text output

1async function reviewCode(code: string) {
2  const response = await fetch(
3    "https://api.openai.com/v1/chat/completions",
4    {
5      method: "POST",
6      headers: {
7        "Authorization": `Bearer ${API_KEY}`,
8        "Content-Type": "application/json",
9      },
10      body: JSON.stringify({
11        model: "gpt-4o-mini",
12        messages: [
13          { role: "user", content: `Review: ${code}` }
14        ],
15      }),
16    }
17  );
18  const data = await response.json();
19  return data.choices[0].message.content;
20  // Returns raw text — unparseable!
21}

1 / 5

Production Gotchas

Embedding costs are cheap (~$0.02 per 1M tokens) but re-embedding your entire dataset on model change is expensive. Always store raw text alongside vectors — you'll need to re-embed when better models drop. Dimensionality matters: text-embedding-3-small (1536d) vs text-embedding-3-large (3072d) — more dimensions = better quality but more storage/compute. Normalize your vectors for cosine similarity.

Code Comparison

Search: SQL LIKE vs Semantic

Traditional text search vs AI-powered semantic search

SQL Text SearchTraditional

// Traditional search — matches characters
const results = await db.query(
  `SELECT * FROM products
   WHERE name ILIKE $1
   OR description ILIKE $1
   ORDER BY relevance
   LIMIT 10`,
  [`%${searchQuery}%`]
);
// "comfortable shoes"
// won't match "cozy sneakers" ✗
// won't match "comfy footwear" ✗
// won't match "easy-to-wear kicks" ✗

Semantic SearchAI Engineering

// Semantic search — matches meaning
import { embed } from "ai";
import { openai } from "@ai-sdk/openai";

// 1. Embed the query (~2ms)
const { embedding } = await embed({
  model: openai.embedding(
    "text-embedding-3-small"
  ),
  value: searchQuery,
});

// 2. Find similar vectors (top-K)
const results = await vectorDB.query({
  vector: embedding,
  topK: 10,
});
// "comfortable shoes" MATCHES:
// "cozy sneakers" ✓ (0.91 similarity)
// "comfy footwear" ✓ (0.89 similarity)
// "easy-to-wear kicks" ✓ (0.84 similarity)

KEY DIFFERENCES

SQL LIKE matches characters — embeddings match meaning
Embeddings turn text into vectors (arrays of 1536 numbers)
Similar meanings = close vectors (cosine similarity → 0 to 1)
No need to predict all synonyms — the model handles it

Bridge Map: SQL LIKE / full-text search → Vector similarity search

Click any bridge to see the translation

Hands-On Challenges

Build, experiment, and get AI-powered feedback on your code.

starter

Compute Cosine Similarity

Implement a cosine similarity function that compares two embedding vectors and returns a similarity score between 0 and 1. Then use it to find which document is most similar to a query. Formula: dot(A,B) / (|A| × |B|)

PLAYGROUND

import { useState } from "react";

// Pre-computed "embeddings" (simplified 4D vectors for demo)
const documents = [
  { text: "How to train a neural network", embedding: [0.9, 0.1, 0.8, 0.2] },
  { text: "Best restaurants in New York", embedding: [0.1, 0.9, 0.1, 0.8] },
  { text: "Deep learning with PyTorch", embedding: [0.85, 0.15, 0.75, 0.25] },
  { text: "Italian cooking recipes", embedding: [0.15, 0.85, 0.2, 0.7] },
  { text: "Transformer architecture explained", embedding: [0.88, 0.12, 0.82, 0.18] },
  { text: "Guide to Tokyo sushi bars", embedding: [0.12, 0.88, 0.15, 0.75] },
];

const queryEmbedding = [0.88, 0.12, 0.82, 0.18]; // "machine learning basics"

// TODO: Implement cosine similarity
// Formula: dot(A,B) / (magnitude(A) * magnitude(B))
// dot(A,B) = sum of a[i] * b[i]
// magnitude(A) = sqrt(sum of a[i]^2)
function cosineSimilarity(a: number[], b: number[]): number {
  return 0; // Replace this
}

export default function App() {
  const [results, setResults] = useState<{ text: string; score: number }[]>([]);

  function handleSearch() {
    const scored = documents.map((doc) => ({
      text: doc.text,
      score: cosineSimilarity(doc.embedding, queryEmbedding),
    }));
    scored.sort((a, b) => b.score - a.score);
    setResults(scored);
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>🔍 Semantic Search (Simplified)</h2>
      <p style={{ color: "#666", fontSize: 13 }}>Query: "machine learning basics" — find the most similar documents</p>
      <button
        onClick={handleSearch}
        style={{ padding: "8px 20px", background: "#0ea5e9", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}
      >
        Find Similar Documents
      </button>
      {results.length > 0 && (
        <ul style={{ marginTop: 16, listStyle: "none", padding: 0 }}>
          {results.map((r, i) => (
            <li key={i} style={{
              margin: "6px 0", padding: "8px 12px", borderRadius: 6,
              background: r.score > 0.95 ? "#dcfce7" : r.score > 0.8 ? "#f0fdf4" : "#f8fafc",
              border: "1px solid " + (r.score > 0.95 ? "#bbf7d0" : "#e2e8f0"),
            }}>
              <strong style={{ color: r.score > 0.8 ? "#166534" : "#64748b" }}>
                {(r.score * 100).toFixed(1)}%
              </strong>
              {" — "}{r.text}
            </li>
          ))}
        </ul>
      )}
    </div>
  );
}

Open Sandbox

stretch

Build a Mini Vector Store

Create an in-memory vector store class that can add documents with embeddings, search by finding the k most similar documents, and filter by a minimum similarity threshold. This simulates exactly how Pinecone/Qdrant/Chroma work.

PLAYGROUND

import { useState } from "react";

interface Document {
  id: string;
  text: string;
  embedding: number[];
  metadata?: Record<string, string>;
}

interface SearchResult {
  doc: Document;
  score: number;
}

// TODO: Implement the VectorStore class
class VectorStore {
  private documents: Document[] = [];

  // Add a document with its embedding
  add(id: string, text: string, embedding: number[], metadata?: Record<string, string>) {
    // TODO: Store the document
  }

  // Search for top-k most similar documents above a threshold
  search(queryEmbedding: number[], k: number = 3, threshold: number = 0): SearchResult[] {
    // TODO: Compute similarity for each doc, filter by threshold, sort, return top k
    return [];
  }

  get size() {
    return this.documents.length;
  }
}

// Build a store with diverse content
const store = new VectorStore();
store.add("1", "Introduction to machine learning algorithms", [0.9, 0.1, 0.8, 0.2], { category: "ML" });
store.add("2", "React hooks and state management patterns", [0.2, 0.3, 0.1, 0.9], { category: "Frontend" });
store.add("3", "Neural network architectures and training", [0.85, 0.15, 0.9, 0.1], { category: "ML" });
store.add("4", "CSS Grid and Flexbox layout guide", [0.1, 0.4, 0.2, 0.85], { category: "Frontend" });
store.add("5", "Transformer models and attention mechanisms", [0.95, 0.05, 0.85, 0.15], { category: "ML" });
store.add("6", "Building REST APIs with Express.js", [0.3, 0.5, 0.2, 0.7], { category: "Backend" });
store.add("7", "Reinforcement learning for game AI", [0.8, 0.2, 0.75, 0.3], { category: "ML" });
store.add("8", "TypeScript generics and type utilities", [0.25, 0.35, 0.15, 0.88], { category: "Frontend" });

export default function App() {
  const [query] = useState("deep learning fundamentals");
  const [results, setResults] = useState<SearchResult[]>([]);
  const [topK, setTopK] = useState(3);

  function handleSearch() {
    const queryEmbedding = [0.88, 0.12, 0.82, 0.18]; // "deep learning fundamentals"
    setResults(store.search(queryEmbedding, topK, 0.5));
  }

  return (
    <div style={{ padding: 20, fontFamily: "sans-serif" }}>
      <h2>📦 Mini Vector Store</h2>
      <p style={{ color: "#666", fontSize: 13 }}>Documents indexed: {store.size} | Threshold: 0.5</p>
      <p>Query: "{query}"</p>
      <div style={{ margin: "8px 0", display: "flex", gap: 8, alignItems: "center" }}>
        <label style={{ fontSize: 13 }}>Top K:</label>
        <select value={topK} onChange={(e) => setTopK(Number(e.target.value))}
          style={{ padding: "4px 8px", borderRadius: 4 }}>
          <option value={1}>1</option>
          <option value={3}>3</option>
          <option value={5}>5</option>
        </select>
        <button onClick={handleSearch}
          style={{ padding: "6px 16px", background: "#8b5cf6", color: "white", border: "none", borderRadius: 6, cursor: "pointer" }}>
          Search
        </button>
      </div>
      {results.length > 0 ? (
        <ol style={{ marginTop: 16, paddingLeft: 20 }}>
          {results.map((r) => (
            <li key={r.doc.id} style={{ margin: "6px 0", fontSize: 13 }}>
              <strong>{(r.score * 100).toFixed(1)}%</strong> — {r.doc.text}
              {r.doc.metadata?.category && (
                <span style={{ marginLeft: 8, padding: "1px 6px", background: "#e0e7ff", borderRadius: 4, fontSize: 11 }}>
                  {r.doc.metadata.category}
                </span>
              )}
            </li>
          ))}
        </ol>
      ) : results.length === 0 && (
        <p style={{ marginTop: 16, color: "#94a3b8", fontSize: 13 }}>Click Search to find similar documents</p>
      )}
    </div>
  );
}

Open Sandbox

Real-World Challenge

Semantic Product Search Engine

Build and deploy a semantic search engine that lets users search a product catalog by meaning, not keywords. When someone searches 'comfortable shoes', they should find 'cozy sneakers'. This is the same search technology powering modern e-commerce.

~3h estimated

Next.js 14+Vercel AI SDKOpenAI text-embedding-3-smallTailwind CSSVercel (deploy)

Acceptance Criteria

Index 50+ products with text descriptions and metadata (price, category, etc.)
Generate embeddings using the OpenAI embedding API (text-embedding-3-small)
Implement cosine similarity search that returns top-K results ranked by relevance
Show similarity scores visually (progress bars or percentage badges)
Filter out low-quality matches below a threshold (e.g., score < 0.5)
Handle edge cases: empty queries, no results found, API failures
Deploy to a public URL (Vercel, Netlify, etc.)

Build Roadmap

0/6

Create a new Next.js app with TypeScript and Tailwind CSS. Set up your project structure with a products data file and a search API route.

npx create-next-app@latest semantic-search --typescript --tailwind --app

Create a /data/products.ts file for your product catalog

Deploy Tip

Push to GitHub and import into Vercel. Pre-compute your embeddings at build time or on first request, then cache them. Set your OPENAI_API_KEY in Vercel environment variables.

After Learning — Rate Your Confidence Again

I can explain what embeddings are, compute cosine similarity, and build a semantic search system that matches meaning instead of keywords.

1 = no idea · 5 = ship it blindfolded

Day 1: LLM APIs, Structured Output & Streaming

Day 3: RAG Deep Dive

Embeddings & Vector Search

Learning Objectives

Ship It: Semantic search engine

From Text Search to Semantic Search

🗺️ Vector Space — Words as Coordinates

Embeddings Are Feature Vectors You Already Understand

🔬 Embedding Dimensions — What the Model Learns

Real Vector Databases: When to Use What

📐 Cosine Similarity — Measuring Meaning Distance

The Full Evolution

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Production Gotchas

Code Comparison

Search: SQL LIKE vs Semantic

Bridge Map: SQL LIKE / full-text search → Vector similarity search

Hands-On Challenges

Compute Cosine Similarity

Build a Mini Vector Store

Semantic Product Search Engine

Acceptance Criteria

Build Roadmap

Discussion

🗺️ Vector Space — Words as Coordinates

🔬 Embedding Dimensions — What the Model Learns

📐 Cosine Similarity — Measuring Meaning Distance

🔄 Code Evolution — One Function, Five Stages

Step 1: Raw fetch()

Compute Cosine Similarity

Build a Mini Vector Store

Discussion