Week 15: Capstone Project | AI Engineering Mastery

Capstone Architecture

🌐 Frontend

⚡ API Layer

🤖 AI Engine

RAG Pipeline

Agent System

MCP Server

Evals Suite

🗄️ Vector DB

📊 LLM APIs

🔒 Auth

Your capstone project brings together everything: RAG, agents, MCP, evaluations, and production deployment.

1. Project Planning

Project planning flow: from problem selection to build kickoff

1.1 How to Choose a Capstone Project

Your capstone project is the culmination of 14 weeks of learning. It should demonstrate your ability to design, build, and deploy an AI-powered application. Here is how to choose wisely:

Selection Criteria

Solves a real problem: Choose something you or others would actually use. Portfolio projects that solve real pain points stand out.
Demonstrates breadth: Incorporate multiple concepts from the course (RAG, agents, evaluation, deployment).
Achievable scope: You have 2 weeks. Better to ship a polished MVP than have an incomplete ambitious project.
Showcases depth: Go beyond "call an API." Show that you understand the engineering behind the solution.
Has a clear demo: The project should be demo-able in 5 minutes to a technical audience.

1.2 Scope Assessment Framework

Rate your project on these dimensions to ensure it is appropriately scoped:

Dimension	Small (1-2 days)	Medium (3-5 days)	Large (1-2 weeks)
LLM Integration	Single API call	Chain of calls, tool use	Multi-agent, RAG, eval
Data Pipeline	Single file input	Multiple sources, embeddings	ETL, vector DB, caching
Frontend	Streamlit only	Polished Streamlit/Gradio	Custom React/Next.js
Backend	Script-based	FastAPI with basic routes	Full API, auth, queues
Evaluation	Manual testing	Basic metrics	Automated eval pipeline
Deployment	Local only	Single platform deploy	CI/CD, monitoring

Target: Medium to Large scope. Aim for at least "Medium" in every dimension and "Large" in 2-3 dimensions.

1.3 Timeline Planning (2-Week Build)

10-day capstone build schedule across two focused weeks

Recommended Timeline


  Week 1: Foundation
  ==================
  Day 1-2: Architecture design + project setup
    - Define the problem and user stories
    - Draw the architecture diagram
    - Set up repo, virtual env, dependencies
    - Choose and configure LLM provider(s)

  Day 3-4: Core AI pipeline
    - Implement the main AI/LLM functionality
    - Build data ingestion (if RAG)
    - Set up vector store (if needed)
    - Get the core "happy path" working end-to-end

  Day 5: API layer
    - Build FastAPI endpoints
    - Request/response models
    - Error handling
    - Basic tests

  Week 2: Polish and Ship
  =======================
  Day 6-7: Frontend and UX
    - Build the user interface
    - Connect to API
    - Handle loading states, errors
    - Make it look professional

  Day 8-9: Evaluation and hardening
    - Set up evaluation pipeline
    - Test edge cases
    - Add caching, rate limiting
    - Fix bugs, improve prompts

  Day 10: Deployment and documentation
    - Dockerize the application
    - Deploy to cloud
    - Write README
    - Record demo video

Project Development Lifecycle

graph LR Define["Define Problem
& User Stories"] --> Arch["Design
Architecture"] Arch --> Setup["Project Setup
& Dependencies"] Setup --> Core["Build Core
AI Pipeline"] Core --> API["Build API
Layer"] API --> UI["Frontend
& UX"] UI --> Eval["Evaluation
& Hardening"] Eval --> Deploy["Deploy
& Document"] Deploy -->|"Iterate"| Eval style Define fill:#4CAF50,stroke:#333,color:#fff style Arch fill:#66BB6A,stroke:#333,color:#fff style Core fill:#2196F3,stroke:#333,color:#fff style API fill:#42A5F5,stroke:#333,color:#fff style UI fill:#FF9800,stroke:#333,color:#fff style Eval fill:#EF5350,stroke:#333,color:#fff style Deploy fill:#9C27B0,stroke:#333,color:#fff

1.4 Technology Stack Selection Guide

Decision Matrix


  "I want to build quickly"
    -> Streamlit + OpenAI API + Chroma

  "I want production-grade"
    -> FastAPI + LangGraph + Qdrant + Next.js

  "I want to use open-source models"
    -> FastAPI + Ollama/vLLM + pgvector + Gradio

  "I want multi-agent"
    -> LangGraph or CrewAI + FastAPI

  "I want to process documents"
    -> LangChain doc loaders + Qdrant + OpenAI

2. Project Ideas (Detailed)

Below are 10 detailed project ideas, each with architecture, tech stack, and learning outcomes. Choose one or combine elements from multiple ideas.

Project 1: AI-Powered Document Q&A System

Difficulty: Medium-Hard

Description

Build a production-grade RAG system that lets users upload documents (PDFs, Word, web pages), ask questions, and get accurate answers with source citations. Includes evaluation pipeline and admin dashboard.

Architecture


  User uploads documents
       |
       v
  +------------------+     +------------------+
  | Document         | --> | Chunking &       |
  | Ingestion        |     | Embedding        |
  | (PDF, DOCX, URL) |     | Pipeline         |
  +------------------+     +------------------+
                                  |
                                  v
                           +------------------+
                           | Vector Store     |
                           | (Qdrant)         |
                           +------------------+
                                  ^
                                  | retrieve
  User asks question              |
       |                          |
       v                          |
  +------------------+    +------------------+
  | Query Pipeline   | -> | Reranker         |
  | (embedding +     |    | (cross-encoder)  |
  | search)          |    +------------------+
  +------------------+            |
                                  v
                           +------------------+
                           | LLM Generation   |
                           | (with citations) |
                           +------------------+
                                  |
                                  v
                           Answer + Sources

Tech Stack

Backend: FastAPI, Python 3.11+
LLM: OpenAI GPT-4o / Anthropic Claude
Embeddings: OpenAI text-embedding-3-small or sentence-transformers
Vector Store: Qdrant (or Chroma for simplicity)
Reranker: Cohere Rerank or cross-encoder
Document Processing: PyMuPDF, python-docx, BeautifulSoup
Frontend: Streamlit or Gradio
Evaluation: RAGAS, custom metrics

Key Components to Build

Document ingestion pipeline (multiple formats)
Smart chunking (semantic, by section headers)
Hybrid search (vector + keyword)
Reranking for better precision
Answer generation with source citations
Conversation memory (multi-turn)
Evaluation pipeline (faithfulness, relevancy, answer correctness)
Admin panel: upload docs, view analytics, manage collections

What You Will Learn

RAG engineering, chunking strategies, embedding models, vector search, reranking, prompt engineering for grounded generation, evaluation with RAGAS, production API design.

Project 2: Intelligent Customer Support Bot

Difficulty: Hard

Description

Build a multi-agent customer support system that can answer product questions (RAG), perform actions (tool calling), escalate to humans, and learn from feedback.

Architecture


  Customer Message
       |
       v
  +------------------+
  | Router Agent     |  Classifies intent: FAQ, action, complaint, escalate
  +------------------+
       |
       +-- FAQ --> RAG Agent (knowledge base search)
       |
       +-- Action --> Tool Agent (check order, update account, etc.)
       |
       +-- Complaint --> Empathy Agent (acknowledge, offer resolution)
       |
       +-- Complex --> Escalation (notify human agent)
       |
       v
  +------------------+
  | Response         |  Synthesize final response, maintain tone
  | Synthesizer      |
  +------------------+
       |
       v
  Customer Response + Internal Logging

Tech Stack

Agent Framework: LangGraph (for stateful multi-agent orchestration)
LLM: GPT-4o-mini (fast, cheap) + GPT-4o (complex cases)
RAG: Qdrant + OpenAI embeddings
Tools: Mock order system, CRM API
Backend: FastAPI with WebSocket support
Frontend: Streamlit chat or custom React chat widget
Logging: LangSmith or custom logging

What You Will Learn

Multi-agent orchestration, tool calling, conversation management, escalation patterns, RAG for customer support, feedback loops, production agent deployment.

Project 3: Code Review Assistant

Difficulty: Medium

Description

An AI assistant that analyzes code diffs/pull requests, identifies potential bugs, suggests improvements, checks for security issues, and provides educational explanations.

Architecture


  GitHub PR / Code Diff
       |
       v
  +------------------+
  | Code Parser      |  Parse diff, identify changed files and context
  +------------------+
       |
       v
  +-------------------------------------------+
  |         Parallel Analysis Agents           |
  |                                           |
  | [Bug Detector] [Security] [Style] [Perf]  |
  +-------------------------------------------+
       |
       v
  +------------------+
  | Review           |  Compile findings, prioritize, format
  | Synthesizer      |
  +------------------+
       |
       v
  Formatted Code Review (comments on specific lines)

Tech Stack

LLM: Claude (excellent at code) or GPT-4o
Code parsing: tree-sitter, unidiff
GitHub integration: PyGithub or GitHub API
Backend: FastAPI
Frontend: GitHub App or Streamlit

What You Will Learn

Code analysis with LLMs, structured output for code review, GitHub API integration, parallel LLM calls, prompt engineering for technical tasks.

Project 4: AI Content Pipeline

Difficulty: Medium-Hard

Description

A multi-agent content creation pipeline: Research a topic, create an outline, write the content, edit for quality, generate images, and prepare for publishing. Each step uses a specialized agent.

Architecture


  Topic Input: "Write a blog post about quantum computing for beginners"
       |
       v
  [Research Agent] -> Search web, gather sources, extract key points
       |
       v
  [Outline Agent] -> Create structured outline from research
       |
       v
  [Writer Agent] -> Write full content following outline
       |
       v
  [Editor Agent] -> Check grammar, flow, accuracy, suggest edits
       |
       v
  [Image Agent] -> Generate illustrations with Stable Diffusion
       |
       v
  [Publisher Agent] -> Format as HTML/Markdown, prepare for CMS
       |
       v
  Final Content Package (text + images + metadata)

Tech Stack

Agent Framework: LangGraph or CrewAI
LLMs: GPT-4o (writing), Claude (editing)
Web Search: Tavily API or SerpAPI
Image Generation: DALL-E 3 or Stable Diffusion API
Frontend: Streamlit with progress tracking

What You Will Learn

Multi-agent workflows, sequential pipeline orchestration, web search integration, content quality evaluation, image generation APIs, human-in-the-loop editing.

Project 5: Multimodal Search Engine

Difficulty: Hard

Description

A search engine that supports text, image, and video search using CLIP embeddings. Users can search by typing text, uploading images, or combining both. Supports cross-modal retrieval (find images matching text, find text matching images).

Tech Stack

Embeddings: CLIP (OpenAI clip-vit-base-patch32 or SigLIP)
Vector Store: Qdrant (supports multiple vector fields)
Backend: FastAPI
Video Processing: OpenCV, frame extraction
Frontend: Streamlit or Next.js

What You Will Learn

Multimodal embeddings, CLIP architecture, cross-modal retrieval, vector database optimization, building search UIs, video processing.

Project 6: AI Tutor

Difficulty: Medium-Hard

Description

A personalized AI tutoring system that adapts to the student's level, tracks knowledge gaps, generates practice problems, explains concepts with analogies, and provides Socratic-style guidance rather than direct answers.

Tech Stack

LLM: GPT-4o with structured prompts for pedagogy
Knowledge Tracking: Simple skill graph in PostgreSQL
RAG: Subject-specific knowledge base
Frontend: Streamlit with interactive elements

What You Will Learn

Prompt engineering for educational contexts, knowledge graph construction, adaptive systems, long conversation management, pedagogical AI design.

Project 7: Automated Data Analysis Agent

Difficulty: Medium

Description

Upload a CSV/Excel file and get automated analysis: statistical summaries, visualizations, correlations, anomalies, and a narrative report. The agent writes and executes Python code to analyze the data.

Tech Stack

LLM: GPT-4o (code generation + analysis)
Code Execution: Sandboxed Python (E2B or Docker)
Visualization: matplotlib, seaborn (generated by agent)
Frontend: Streamlit with file upload and chart display

What You Will Learn

Code generation with LLMs, sandboxed execution, data analysis automation, chart generation, report writing, tool calling for data tasks.

Project 8: AI Meeting Assistant

Difficulty: Medium

Description

Processes meeting recordings or transcripts to produce structured summaries, action items, decisions, and follow-up reminders. Can answer questions about past meetings.

Tech Stack

Transcription: Whisper (local) or AssemblyAI
LLM: GPT-4o-mini (summaries) + GPT-4o (complex analysis)
RAG: Store past meeting data for Q&A
Frontend: Streamlit

What You Will Learn

Audio processing, speech-to-text, long document summarization, structured extraction, RAG over temporal data, calendar/task integration.

Project 9: Legal Document Analyzer

Difficulty: Hard

Description

Analyze legal contracts and documents: extract key clauses, highlight risks, compare document versions, and answer questions about legal terms. Includes citation to specific sections.

Tech Stack

Document Processing: PyMuPDF, unstructured.io
LLM: Claude (strong at long document analysis)
RAG: Section-aware chunking + Qdrant
Frontend: Streamlit with PDF viewer

What You Will Learn

Legal document processing, section-aware parsing, comparative analysis, risk assessment prompting, long-context strategies, citation generation.

Project 10: AI-Powered Monitoring Dashboard

Difficulty: Hard

Description

Build a monitoring and observability platform for LLM applications. Track latency, cost, token usage, error rates, prompt/response quality, and detect anomalies.

Tech Stack

Data Collection: OpenTelemetry, custom middleware
Storage: PostgreSQL + TimescaleDB
LLM Evaluation: Automated quality scoring
Alerting: Custom rules + anomaly detection
Frontend: Streamlit dashboards or Grafana

What You Will Learn

LLM observability, production monitoring, cost tracking, quality metrics, anomaly detection, dashboard design, operational AI engineering.

3. Architecture Patterns for AI Projects

Multi-agent routing: classify intent, dispatch to specialized agents, aggregate results

AI System Architecture Template

graph TB User["User / Client"] --> FE["Frontend
(Streamlit / React)"] FE --> API["API Server
(FastAPI)"] API --> LLM["LLM Service
(OpenAI / Anthropic)"] API --> RAG["RAG Pipeline"] RAG --> VDB["Vector Store
(Qdrant / Chroma)"] RAG --> Embed["Embedding
Service"] API --> Cache["Cache Layer
(Redis)"] API --> DB["Database
(PostgreSQL)"] API --> Eval["Evaluation
Pipeline"] style User fill:#9C27B0,stroke:#333,color:#fff style API fill:#2196F3,stroke:#333,color:#fff style LLM fill:#4CAF50,stroke:#333,color:#fff style RAG fill:#FF9800,stroke:#333,color:#fff style VDB fill:#EF5350,stroke:#333,color:#fff

Review and Iteration Cycle

graph LR Build["Build Feature"] --> Test["Test &
Evaluate"] Test --> Review["Review
Results"] Review -->|"Pass"| Ship["Ship to
Production"] Review -->|"Fail"| Fix["Fix Issues &
Improve Prompts"] Fix --> Build Ship --> Monitor["Monitor &
Collect Feedback"] Monitor -->|"New Issues"| Build style Build fill:#2196F3,stroke:#333,color:#fff style Test fill:#FF9800,stroke:#333,color:#fff style Review fill:#9C27B0,stroke:#333,color:#fff style Ship fill:#4CAF50,stroke:#333,color:#fff style Fix fill:#EF5350,stroke:#333,color:#fff style Monitor fill:#607D8B,stroke:#333,color:#fff

3.1 Monolithic vs Microservices for AI

For Your Capstone: Start Monolithic

For a 2-week project, a well-structured monolith is the right choice. Here is why and how:


  Recommended Structure (Monolithic):
  ===================================
  project/
  +-- app/
  |   +-- __init__.py
  |   +-- main.py              # FastAPI app, routes
  |   +-- config.py            # Settings and environment variables
  |   +-- models/
  |   |   +-- schemas.py       # Pydantic models for API
  |   |   +-- database.py      # DB models (if needed)
  |   +-- services/
  |   |   +-- llm.py           # LLM client wrapper
  |   |   +-- rag.py           # RAG pipeline
  |   |   +-- embeddings.py    # Embedding service
  |   |   +-- agents.py        # Agent logic
  |   +-- utils/
  |       +-- prompts.py       # Prompt templates
  |       +-- chunking.py      # Text chunking utilities
  +-- tests/
  |   +-- test_rag.py
  |   +-- test_api.py
  +-- evaluation/
  |   +-- eval_pipeline.py     # Evaluation scripts
  |   +-- test_cases.json      # Test Q&A pairs
  +-- frontend/
  |   +-- app.py               # Streamlit app
  +-- scripts/
  |   +-- ingest_documents.py  # Data ingestion scripts
  +-- Dockerfile
  +-- docker-compose.yml
  +-- requirements.txt
  +-- .env.example
  +-- README.md

3.2 API Design for AI Services

from fastapi import FastAPI, UploadFile, File, HTTPException, BackgroundTasks
from pydantic import BaseModel, Field
from typing import Optional
import uuid
from datetime import datetime

app = FastAPI(title="AI Document Q&A", version="1.0.0")


# ====================
# Request/Response Models
# ====================

class QuestionRequest(BaseModel):
    """Request model for asking a question."""
    question: str = Field(..., min_length=1, max_length=2000,
                          description="The question to ask about the documents")
    collection_id: str = Field(default="default",
                                description="Which document collection to search")
    max_sources: int = Field(default=5, ge=1, le=20,
                              description="Maximum number of source passages to retrieve")
    model: str = Field(default="gpt-4o-mini",
                       description="LLM model to use for answer generation")


class Source(BaseModel):
    """A source passage used to generate the answer."""
    document_name: str
    page_number: Optional[int] = None
    chunk_text: str
    relevance_score: float


class AnswerResponse(BaseModel):
    """Response model for a question answer."""
    answer: str
    sources: list[Source]
    model_used: str
    latency_ms: float
    token_usage: dict


class DocumentUploadResponse(BaseModel):
    """Response after uploading a document."""
    document_id: str
    filename: str
    num_chunks: int
    status: str
    message: str


class HealthResponse(BaseModel):
    """Health check response."""
    status: str
    version: str
    models_available: list[str]
    vector_store_status: str


# ====================
# API Endpoints
# ====================

@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Check the health of all services."""
    return HealthResponse(
        status="healthy",
        version="1.0.0",
        models_available=["gpt-4o-mini", "gpt-4o", "claude-sonnet"],
        vector_store_status="connected",
    )


@app.post("/documents/upload", response_model=DocumentUploadResponse)
async def upload_document(
    file: UploadFile = File(...),
    collection_id: str = "default",
    background_tasks: BackgroundTasks = None,
):
    """
    Upload a document for indexing.
    Supports PDF, DOCX, TXT, and Markdown files.
    Processing happens in the background.
    """
    allowed_extensions = {".pdf", ".docx", ".txt", ".md"}
    ext = "." + file.filename.split(".")[-1].lower()

    if ext not in allowed_extensions:
        raise HTTPException(
            status_code=400,
            detail=f"Unsupported file type: {ext}. Supported: {allowed_extensions}"
        )

    doc_id = str(uuid.uuid4())

    # Save file and process in background
    content = await file.read()

    # In a real implementation:
    # background_tasks.add_task(process_document, doc_id, content, ext, collection_id)

    return DocumentUploadResponse(
        document_id=doc_id,
        filename=file.filename,
        num_chunks=0,  # Updated after processing
        status="processing",
        message="Document is being processed. Use the status endpoint to check progress.",
    )


@app.post("/ask", response_model=AnswerResponse)
async def ask_question(request: QuestionRequest):
    """
    Ask a question about the uploaded documents.
    Uses RAG to retrieve relevant passages and generate an answer.
    """
    import time
    start_time = time.time()

    # In a real implementation:
    # 1. Embed the question
    # 2. Search the vector store
    # 3. Rerank results
    # 4. Generate answer with LLM
    # 5. Extract citations

    # Placeholder response
    latency = (time.time() - start_time) * 1000

    return AnswerResponse(
        answer="This is a placeholder answer. Implement the RAG pipeline.",
        sources=[],
        model_used=request.model,
        latency_ms=latency,
        token_usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
    )


@app.get("/documents/{collection_id}")
async def list_documents(collection_id: str = "default"):
    """List all documents in a collection."""
    # Return list of documents with metadata
    return {"collection_id": collection_id, "documents": []}

3.3 Caching Strategies for LLM Responses

import hashlib
import json
from functools import wraps
from typing import Optional

# ====================
# Simple In-Memory Cache
# ====================

class LLMCache:
    """
    Cache for LLM responses to avoid redundant API calls.

    Strategies:
    1. Exact match: Cache based on exact prompt hash
    2. Semantic cache: Cache based on embedding similarity (more advanced)
    """

    def __init__(self, max_size: int = 1000):
        self.cache: dict[str, dict] = {}
        self.max_size = max_size
        self.hits = 0
        self.misses = 0

    def _make_key(self, prompt: str, model: str, **kwargs) -> str:
        """Create a cache key from prompt and parameters."""
        key_data = {
            "prompt": prompt,
            "model": model,
            "temperature": kwargs.get("temperature", 0),
            "max_tokens": kwargs.get("max_tokens"),
        }
        key_string = json.dumps(key_data, sort_keys=True)
        return hashlib.sha256(key_string.encode()).hexdigest()

    def get(self, prompt: str, model: str, **kwargs) -> Optional[str]:
        """Try to get a cached response."""
        key = self._make_key(prompt, model, **kwargs)
        if key in self.cache:
            self.hits += 1
            return self.cache[key]["response"]
        self.misses += 1
        return None

    def set(self, prompt: str, model: str, response: str, **kwargs):
        """Cache a response."""
        if len(self.cache) >= self.max_size:
            # Evict oldest entry
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]

        key = self._make_key(prompt, model, **kwargs)
        self.cache[key] = {
            "response": response,
            "model": model,
        }

    @property
    def hit_rate(self) -> float:
        total = self.hits + self.misses
        return self.hits / total if total > 0 else 0


# Usage with a decorator
llm_cache = LLMCache()

def cached_llm_call(func):
    """Decorator to cache LLM calls."""
    @wraps(func)
    def wrapper(prompt: str, model: str = "gpt-4o-mini", **kwargs):
        # Only cache deterministic calls (temperature=0)
        if kwargs.get("temperature", 0) == 0:
            cached = llm_cache.get(prompt, model, **kwargs)
            if cached is not None:
                return cached

        response = func(prompt, model, **kwargs)

        if kwargs.get("temperature", 0) == 0:
            llm_cache.set(prompt, model, response, **kwargs)

        return response
    return wrapper


@cached_llm_call
def call_llm(prompt: str, model: str = "gpt-4o-mini", **kwargs) -> str:
    """Call the LLM (with caching)."""
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        **kwargs,
    )
    return response.choices[0].message.content

3.4 Queue-Based Processing for Async AI Tasks

import asyncio
from collections import deque
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
import uuid
import time


class TaskStatus(str, Enum):
    PENDING = "pending"
    PROCESSING = "processing"
    COMPLETED = "completed"
    FAILED = "failed"


@dataclass
class AITask:
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
    task_type: str = ""
    input_data: dict = field(default_factory=dict)
    status: TaskStatus = TaskStatus.PENDING
    result: Any = None
    error: str = None
    created_at: float = field(default_factory=time.time)
    completed_at: float = None


class AITaskQueue:
    """
    Simple async task queue for AI processing.

    For production, use Celery + Redis or similar.
    This demonstrates the pattern.
    """

    def __init__(self, max_concurrent: int = 5):
        self.queue: deque[AITask] = deque()
        self.tasks: dict[str, AITask] = {}
        self.max_concurrent = max_concurrent
        self.semaphore = asyncio.Semaphore(max_concurrent)

    def submit(self, task_type: str, input_data: dict) -> str:
        """Submit a task and return its ID."""
        task = AITask(task_type=task_type, input_data=input_data)
        self.queue.append(task)
        self.tasks[task.id] = task
        return task.id

    def get_status(self, task_id: str) -> dict:
        """Get the status of a task."""
        task = self.tasks.get(task_id)
        if not task:
            return {"error": "Task not found"}
        return {
            "id": task.id,
            "status": task.status,
            "result": task.result,
            "error": task.error,
        }

    async def process_task(self, task: AITask):
        """Process a single task (override for your specific logic)."""
        async with self.semaphore:
            task.status = TaskStatus.PROCESSING
            try:
                # Route to appropriate handler
                if task.task_type == "document_ingestion":
                    result = await self._ingest_document(task.input_data)
                elif task.task_type == "question_answer":
                    result = await self._answer_question(task.input_data)
                else:
                    raise ValueError(f"Unknown task type: {task.task_type}")

                task.result = result
                task.status = TaskStatus.COMPLETED
            except Exception as e:
                task.error = str(e)
                task.status = TaskStatus.FAILED
            finally:
                task.completed_at = time.time()

    async def _ingest_document(self, data: dict) -> dict:
        """Process document ingestion."""
        # Simulate processing
        await asyncio.sleep(2)
        return {"num_chunks": 42, "status": "indexed"}

    async def _answer_question(self, data: dict) -> dict:
        """Process a question."""
        await asyncio.sleep(1)
        return {"answer": "Processed answer", "sources": []}

    async def run_worker(self):
        """Background worker that processes tasks from the queue."""
        while True:
            if self.queue:
                task = self.queue.popleft()
                asyncio.create_task(self.process_task(task))
            await asyncio.sleep(0.1)


# FastAPI integration
# task_queue = AITaskQueue(max_concurrent=5)

# @app.on_event("startup")
# async def startup():
#     asyncio.create_task(task_queue.run_worker())

# @app.post("/tasks/submit")
# async def submit_task(task_type: str, input_data: dict):
#     task_id = task_queue.submit(task_type, input_data)
#     return {"task_id": task_id, "status": "pending"}

# @app.get("/tasks/{task_id}")
# async def get_task_status(task_id: str):
#     return task_queue.get_status(task_id)

4. Tech Stack Recommendations (2026)

Run multiple LLM tasks in parallel for faster, richer analysis using different models per task

The AI Engineering Stack (March 2026)

Layer	Recommended	Alternatives
Language	Python 3.12+	TypeScript (for full-stack)
Backend Framework	FastAPI	Flask, Django, Hono (TS)
LLM APIs	OpenAI, Anthropic	Google Gemini, Mistral, Groq
Open-Source LLMs	Ollama (local), vLLM (serving)	llama.cpp, TGI
Embeddings	OpenAI text-embedding-3-small	sentence-transformers, Cohere
Vector Store	Qdrant	Chroma, pgvector, Pinecone, Weaviate
Agent Framework	LangGraph	CrewAI, Autogen, custom
Evaluation	promptfoo, RAGAS	DeepEval, custom
Observability	LangSmith, Langfuse	Helicone, custom logging
Frontend	Streamlit (quick), Next.js (production)	Gradio, Chainlit
Deployment	Docker + Railway/Render	Modal, AWS/GCP, Fly.io
Database	PostgreSQL	SQLite (simple), Supabase

5. Sample Project Walkthrough: AI Document Q&A System

Let us build Project #1 (AI Document Q&A) step by step, with complete working code. This serves as a template you can adapt for your own capstone.

5.1 Project Setup

# requirements.txt
"""
fastapi==0.115.0
uvicorn==0.30.0
python-multipart==0.0.9
openai==1.50.0
anthropic==0.35.0
qdrant-client==1.11.0
PyMuPDF==1.24.0
python-docx==1.1.0
sentence-transformers==3.0.0
pydantic==2.9.0
pydantic-settings==2.5.0
streamlit==1.38.0
python-dotenv==1.0.1
"""

# .env.example
"""
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=
COLLECTION_NAME=documents
EMBEDDING_MODEL=text-embedding-3-small
LLM_MODEL=gpt-4o-mini
"""

5.2 Configuration

# app/config.py
from pydantic_settings import BaseSettings
from functools import lru_cache


class Settings(BaseSettings):
    """Application settings loaded from environment variables."""

    # API Keys
    openai_api_key: str = ""
    anthropic_api_key: str = ""

    # Vector Store
    qdrant_url: str = "http://localhost:6333"
    qdrant_api_key: str = ""
    collection_name: str = "documents"

    # Models
    embedding_model: str = "text-embedding-3-small"
    embedding_dimension: int = 1536
    llm_model: str = "gpt-4o-mini"

    # Chunking
    chunk_size: int = 500
    chunk_overlap: int = 50

    # Retrieval
    top_k: int = 10
    rerank_top_k: int = 5

    class Config:
        env_file = ".env"


@lru_cache
def get_settings() -> Settings:
    return Settings()

5.3 Document Ingestion Pipeline

# app/services/ingestion.py
import fitz  # PyMuPDF
import docx
from pathlib import Path
from dataclasses import dataclass


@dataclass
class DocumentChunk:
    """A chunk of text from a document."""
    text: str
    metadata: dict  # source, page_number, chunk_index, etc.


class DocumentProcessor:
    """Process documents into text chunks for embedding."""

    def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def process_file(self, file_path: str, file_bytes: bytes = None) -> list[DocumentChunk]:
        """Process a file and return chunks."""
        ext = Path(file_path).suffix.lower()

        if ext == ".pdf":
            text_by_page = self._extract_pdf(file_path, file_bytes)
        elif ext == ".docx":
            text_by_page = self._extract_docx(file_path, file_bytes)
        elif ext in (".txt", ".md"):
            text_by_page = self._extract_text(file_path, file_bytes)
        else:
            raise ValueError(f"Unsupported file type: {ext}")

        # Chunk each page
        chunks = []
        for page_num, page_text in enumerate(text_by_page):
            page_chunks = self._chunk_text(page_text)
            for chunk_idx, chunk_text in enumerate(page_chunks):
                chunks.append(DocumentChunk(
                    text=chunk_text,
                    metadata={
                        "source": Path(file_path).name,
                        "page_number": page_num + 1,
                        "chunk_index": chunk_idx,
                        "total_chunks": len(page_chunks),
                    }
                ))

        return chunks

    def _extract_pdf(self, file_path: str, file_bytes: bytes = None) -> list[str]:
        """Extract text from each page of a PDF."""
        if file_bytes:
            doc = fitz.open(stream=file_bytes, filetype="pdf")
        else:
            doc = fitz.open(file_path)

        pages = []
        for page in doc:
            text = page.get_text()
            if text.strip():
                pages.append(text)
        doc.close()
        return pages

    def _extract_docx(self, file_path: str, file_bytes: bytes = None) -> list[str]:
        """Extract text from a DOCX file."""
        import io
        if file_bytes:
            doc = docx.Document(io.BytesIO(file_bytes))
        else:
            doc = docx.Document(file_path)

        full_text = "\n".join([para.text for para in doc.paragraphs if para.text.strip()])
        return [full_text]  # DOCX does not have pages per se

    def _extract_text(self, file_path: str, file_bytes: bytes = None) -> list[str]:
        """Extract text from a plain text file."""
        if file_bytes:
            text = file_bytes.decode("utf-8")
        else:
            with open(file_path, "r", encoding="utf-8") as f:
                text = f.read()
        return [text]

    def _chunk_text(self, text: str) -> list[str]:
        """
        Split text into overlapping chunks.
        Uses sentence-aware splitting to avoid cutting mid-sentence.
        """
        if len(text) <= self.chunk_size:
            return [text.strip()] if text.strip() else []

        # Simple sentence-aware chunking
        sentences = text.replace("\n", " ").split(". ")
        chunks = []
        current_chunk = ""

        for sentence in sentences:
            sentence = sentence.strip()
            if not sentence:
                continue

            # Add period back if it was removed by split
            if not sentence.endswith("."):
                sentence += "."

            if len(current_chunk) + len(sentence) + 1 <= self.chunk_size:
                current_chunk += (" " + sentence) if current_chunk else sentence
            else:
                if current_chunk:
                    chunks.append(current_chunk.strip())
                current_chunk = sentence

        if current_chunk.strip():
            chunks.append(current_chunk.strip())

        return chunks

5.4 Embedding and Vector Store

# app/services/embeddings.py
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct, Filter,
    FieldCondition, MatchValue,
)
from app.config import get_settings
import uuid


class EmbeddingService:
    """Manage embeddings and vector store."""

    def __init__(self):
        self.settings = get_settings()
        self.openai_client = OpenAI(api_key=self.settings.openai_api_key)
        self.qdrant = QdrantClient(
            url=self.settings.qdrant_url,
            api_key=self.settings.qdrant_api_key or None,
        )
        self._ensure_collection()

    def _ensure_collection(self):
        """Create the vector collection if it doesn't exist."""
        collections = [c.name for c in self.qdrant.get_collections().collections]
        if self.settings.collection_name not in collections:
            self.qdrant.create_collection(
                collection_name=self.settings.collection_name,
                vectors_config=VectorParams(
                    size=self.settings.embedding_dimension,
                    distance=Distance.COSINE,
                ),
            )
            print(f"Created collection: {self.settings.collection_name}")

    def embed_texts(self, texts: list[str]) -> list[list[float]]:
        """Generate embeddings for a list of texts."""
        response = self.openai_client.embeddings.create(
            model=self.settings.embedding_model,
            input=texts,
        )
        return [item.embedding for item in response.data]

    def index_chunks(self, chunks: list, collection_id: str = "default"):
        """Index document chunks into the vector store."""
        if not chunks:
            return 0

        texts = [chunk.text for chunk in chunks]
        embeddings = self.embed_texts(texts)

        points = []
        for chunk, embedding in zip(chunks, embeddings):
            point = PointStruct(
                id=str(uuid.uuid4()),
                vector=embedding,
                payload={
                    "text": chunk.text,
                    "collection_id": collection_id,
                    **chunk.metadata,
                },
            )
            points.append(point)

        self.qdrant.upsert(
            collection_name=self.settings.collection_name,
            points=points,
        )

        return len(points)

    def search(
        self,
        query: str,
        collection_id: str = "default",
        top_k: int = 10,
    ) -> list[dict]:
        """Search for relevant chunks given a query."""
        query_embedding = self.embed_texts([query])[0]

        results = self.qdrant.search(
            collection_name=self.settings.collection_name,
            query_vector=query_embedding,
            limit=top_k,
            query_filter=Filter(
                must=[
                    FieldCondition(
                        key="collection_id",
                        match=MatchValue(value=collection_id),
                    )
                ]
            ),
        )

        return [
            {
                "text": hit.payload["text"],
                "score": hit.score,
                "source": hit.payload.get("source", "unknown"),
                "page_number": hit.payload.get("page_number"),
            }
            for hit in results
        ]

5.5 RAG Pipeline

Complete RAG pipeline: ingest documents, embed and store, then retrieve relevant chunks for grounded LLM answers

# app/services/rag.py
from openai import OpenAI
from app.config import get_settings
from app.services.embeddings import EmbeddingService


class RAGPipeline:
    """
    Full RAG pipeline: retrieve, rerank, generate with citations.
    """

    def __init__(self):
        self.settings = get_settings()
        self.embedding_service = EmbeddingService()
        self.llm_client = OpenAI(api_key=self.settings.openai_api_key)

    def answer_question(
        self,
        question: str,
        collection_id: str = "default",
        model: str = None,
    ) -> dict:
        """
        Full RAG pipeline to answer a question.

        Steps:
        1. Retrieve relevant chunks
        2. Build context from chunks
        3. Generate answer with LLM
        4. Extract source citations
        """
        import time
        start_time = time.time()

        model = model or self.settings.llm_model

        # Step 1: Retrieve relevant chunks
        retrieved = self.embedding_service.search(
            query=question,
            collection_id=collection_id,
            top_k=self.settings.top_k,
        )

        if not retrieved:
            return {
                "answer": "I could not find any relevant information in the uploaded documents to answer this question.",
                "sources": [],
                "model_used": model,
                "latency_ms": (time.time() - start_time) * 1000,
                "token_usage": {},
            }

        # Step 2: Build context
        context = self._build_context(retrieved)

        # Step 3: Generate answer
        system_prompt = """You are a helpful assistant that answers questions based on the provided context.

Rules:
1. Only answer based on the provided context. Do not use external knowledge.
2. If the context doesn't contain enough information, say so clearly.
3. Cite your sources using [Source: filename, Page X] format.
4. Be concise but thorough.
5. If multiple sources support your answer, cite all of them."""

        user_prompt = f"""Context:
{context}

Question: {question}

Answer the question based only on the context above. Cite your sources."""

        response = self.llm_client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            temperature=0,
            max_tokens=1024,
        )

        answer = response.choices[0].message.content
        latency_ms = (time.time() - start_time) * 1000

        # Step 4: Format sources
        sources = [
            {
                "document_name": r["source"],
                "page_number": r["page_number"],
                "chunk_text": r["text"][:200] + "...",
                "relevance_score": r["score"],
            }
            for r in retrieved[:5]  # Top 5 sources
        ]

        return {
            "answer": answer,
            "sources": sources,
            "model_used": model,
            "latency_ms": latency_ms,
            "token_usage": {
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens,
            },
        }

    def _build_context(self, chunks: list[dict], max_context_length: int = 4000) -> str:
        """Build a context string from retrieved chunks."""
        context_parts = []
        total_length = 0

        for i, chunk in enumerate(chunks):
            source_info = f"[Source {i+1}: {chunk['source']}"
            if chunk.get("page_number"):
                source_info += f", Page {chunk['page_number']}"
            source_info += "]"

            part = f"{source_info}\n{chunk['text']}\n"

            if total_length + len(part) > max_context_length:
                break

            context_parts.append(part)
            total_length += len(part)

        return "\n".join(context_parts)

5.6 Simple Web UI

# frontend/app.py
import streamlit as st
import requests
import json

API_URL = "http://localhost:8000"

st.set_page_config(
    page_title="AI Document Q&A",
    page_icon="📚",
    layout="wide",
)

st.title("AI Document Q&A System")
st.markdown("Upload documents and ask questions about their content.")

# Sidebar: Document Upload
with st.sidebar:
    st.header("Upload Documents")
    uploaded_file = st.file_uploader(
        "Choose a document",
        type=["pdf", "docx", "txt", "md"],
    )

    if uploaded_file and st.button("Upload & Index"):
        with st.spinner("Processing document..."):
            files = {"file": (uploaded_file.name, uploaded_file.getvalue())}
            try:
                response = requests.post(f"{API_URL}/documents/upload", files=files)
                if response.status_code == 200:
                    result = response.json()
                    st.success(f"Document uploaded: {result['filename']}")
                    st.info(f"Chunks created: {result['num_chunks']}")
                else:
                    st.error(f"Error: {response.text}")
            except requests.ConnectionError:
                st.error("Cannot connect to the API server. Is it running?")

    st.divider()
    st.header("Settings")
    model = st.selectbox("Model", ["gpt-4o-mini", "gpt-4o"])
    max_sources = st.slider("Max Sources", 1, 10, 5)

# Main: Chat Interface
st.header("Ask Questions")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        if message.get("sources"):
            with st.expander(f"Sources ({len(message['sources'])})"):
                for source in message["sources"]:
                    st.markdown(f"**{source['document_name']}** "
                              f"(Page {source.get('page_number', 'N/A')}, "
                              f"Score: {source['relevance_score']:.3f})")
                    st.caption(source["chunk_text"])

# Chat input
if prompt := st.chat_input("Ask a question about your documents..."):
    # Display user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Get answer from API
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            try:
                response = requests.post(
                    f"{API_URL}/ask",
                    json={
                        "question": prompt,
                        "model": model,
                        "max_sources": max_sources,
                    },
                )

                if response.status_code == 200:
                    result = response.json()
                    st.markdown(result["answer"])

                    # Show sources
                    if result["sources"]:
                        with st.expander(f"Sources ({len(result['sources'])})"):
                            for source in result["sources"]:
                                st.markdown(
                                    f"**{source['document_name']}** "
                                    f"(Page {source.get('page_number', 'N/A')}, "
                                    f"Score: {source['relevance_score']:.3f})"
                                )
                                st.caption(source["chunk_text"])

                    # Show metrics
                    col1, col2, col3 = st.columns(3)
                    col1.metric("Latency", f"{result['latency_ms']:.0f}ms")
                    col2.metric("Tokens", result["token_usage"].get("total_tokens", 0))
                    col3.metric("Model", result["model_used"])

                    # Save to history
                    st.session_state.messages.append({
                        "role": "assistant",
                        "content": result["answer"],
                        "sources": result["sources"],
                    })
                else:
                    st.error(f"Error: {response.text}")

            except requests.ConnectionError:
                st.error("Cannot connect to the API server.")

5.7 Evaluation Setup

# evaluation/eval_pipeline.py
"""
Evaluation pipeline for the Document Q&A system.
Tests retrieval quality and answer accuracy.
"""
from dataclasses import dataclass
from openai import OpenAI
import json
import time


@dataclass
class TestCase:
    question: str
    expected_answer: str
    expected_sources: list[str]  # Expected document names


@dataclass
class EvalResult:
    question: str
    generated_answer: str
    expected_answer: str
    faithfulness_score: float    # Is the answer grounded in sources?
    relevancy_score: float      # Is the answer relevant to the question?
    correctness_score: float    # Is the answer factually correct?
    source_recall: float        # Were the right sources retrieved?
    latency_ms: float


class RAGEvaluator:
    """Evaluate RAG pipeline quality using LLM-as-judge."""

    def __init__(self, api_url: str = "http://localhost:8000"):
        self.api_url = api_url
        self.judge = OpenAI()

    def evaluate_test_cases(self, test_cases: list[TestCase]) -> list[EvalResult]:
        """Run evaluation on a set of test cases."""
        import requests

        results = []
        for tc in test_cases:
            start = time.time()

            # Get answer from the system
            response = requests.post(
                f"{self.api_url}/ask",
                json={"question": tc.question},
            )
            latency_ms = (time.time() - start) * 1000

            if response.status_code != 200:
                print(f"Error for question: {tc.question}")
                continue

            data = response.json()
            answer = data["answer"]
            sources = [s["document_name"] for s in data.get("sources", [])]

            # Score with LLM-as-judge
            faithfulness = self._score_faithfulness(tc.question, answer, data.get("sources", []))
            relevancy = self._score_relevancy(tc.question, answer)
            correctness = self._score_correctness(tc.question, answer, tc.expected_answer)
            source_recall = self._compute_source_recall(sources, tc.expected_sources)

            result = EvalResult(
                question=tc.question,
                generated_answer=answer,
                expected_answer=tc.expected_answer,
                faithfulness_score=faithfulness,
                relevancy_score=relevancy,
                correctness_score=correctness,
                source_recall=source_recall,
                latency_ms=latency_ms,
            )
            results.append(result)

            print(f"Q: {tc.question[:50]}... | "
                  f"Faith: {faithfulness:.2f} | Rel: {relevancy:.2f} | "
                  f"Corr: {correctness:.2f} | SrcRec: {source_recall:.2f}")

        return results

    def _score_faithfulness(self, question: str, answer: str, sources: list) -> float:
        """Score if the answer is faithful to (grounded in) the sources."""
        source_texts = "\n".join([s.get("chunk_text", "") for s in sources])

        response = self.judge.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Rate the faithfulness of the answer to the given sources on a scale of 0 to 1.
A score of 1 means the answer is fully supported by the sources.
A score of 0 means the answer contains information not in the sources.

Sources:
{source_texts}

Question: {question}
Answer: {answer}

Return ONLY a number between 0 and 1."""
            }],
            temperature=0,
        )
        try:
            return float(response.choices[0].message.content.strip())
        except ValueError:
            return 0.0

    def _score_relevancy(self, question: str, answer: str) -> float:
        """Score if the answer is relevant to the question."""
        response = self.judge.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Rate the relevancy of the answer to the question on a scale of 0 to 1.

Question: {question}
Answer: {answer}

Return ONLY a number between 0 and 1."""
            }],
            temperature=0,
        )
        try:
            return float(response.choices[0].message.content.strip())
        except ValueError:
            return 0.0

    def _score_correctness(self, question: str, answer: str, expected: str) -> float:
        """Score the correctness of the answer against the expected answer."""
        response = self.judge.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Compare the generated answer with the expected answer.
Rate the semantic similarity and factual correctness on a scale of 0 to 1.

Question: {question}
Generated Answer: {answer}
Expected Answer: {expected}

Return ONLY a number between 0 and 1."""
            }],
            temperature=0,
        )
        try:
            return float(response.choices[0].message.content.strip())
        except ValueError:
            return 0.0

    def _compute_source_recall(self, retrieved: list[str], expected: list[str]) -> float:
        """Compute recall of expected source documents."""
        if not expected:
            return 1.0
        hits = sum(1 for e in expected if e in retrieved)
        return hits / len(expected)

    def print_summary(self, results: list[EvalResult]):
        """Print evaluation summary."""
        n = len(results)
        if n == 0:
            print("No results to summarize.")
            return

        avg = lambda vals: sum(vals) / len(vals)

        print("\n" + "=" * 60)
        print("EVALUATION SUMMARY")
        print("=" * 60)
        print(f"Test cases: {n}")
        print(f"Avg Faithfulness:  {avg([r.faithfulness_score for r in results]):.3f}")
        print(f"Avg Relevancy:     {avg([r.relevancy_score for r in results]):.3f}")
        print(f"Avg Correctness:   {avg([r.correctness_score for r in results]):.3f}")
        print(f"Avg Source Recall: {avg([r.source_recall for r in results]):.3f}")
        print(f"Avg Latency:       {avg([r.latency_ms for r in results]):.0f}ms")
        print("=" * 60)

5.8 Deployment

# Dockerfile
"""
FROM python:3.12-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first for caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
"""

# docker-compose.yml
"""
version: '3.8'

services:
  api:
    build: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    depends_on:
      - qdrant

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

  streamlit:
    build:
      context: .
      dockerfile: Dockerfile.streamlit
    ports:
      - "8501:8501"
    environment:
      - API_URL=http://api:8000

volumes:
  qdrant_data:
"""

6. Presentation and Showcase Tips

Structure your demo to tell a compelling story in exactly 5 minutes

6.1 How to Demo an AI Project

The 5-Minute Demo Structure

Problem (30s): What problem does this solve? Why does it matter? A concrete example of the pain point.
Solution Demo (2min): Show the happy path. Upload a document, ask a question, get an answer with citations. Make it look effortless.
Architecture (1min): Show the high-level architecture diagram. Explain key design decisions in 2-3 sentences.
Key Technical Detail (1min): Go deep on one interesting technical challenge you solved. This shows depth.
Results and Next Steps (30s): Share evaluation metrics. What would you improve with more time?

6.2 Key Metrics to Highlight

Quality metrics: Faithfulness, relevancy, correctness scores from your evaluation pipeline
Performance metrics: Latency (p50, p95), throughput, token usage
Cost metrics: Cost per query, monthly projected cost at scale
Scale metrics: Number of documents indexed, concurrent user support

6.3 Common Pitfalls to Avoid

Do not demo without a backup. Have screenshots or a recorded video in case the live demo fails.
Do not show error states accidentally. Test your demo flow beforehand. Use pre-loaded data.
Do not over-scope. A polished small project beats an unfinished large project every time.
Do not ignore evaluation. "It works when I try it" is not enough. Show systematic evaluation.
Do not skip error handling. Show that your system gracefully handles bad inputs, API failures, and edge cases.
Do not hardcode API keys. Use environment variables. Show that you follow security best practices.

Summary

Your Capstone Checklist

Choose a project from the ideas above (or propose your own)
Draw the architecture diagram before writing code
Set up the project structure and dependencies
Build the core AI pipeline first (get end-to-end working)
Add the API layer with proper request/response models
Build a usable frontend (Streamlit is fine)
Set up evaluation and test with real data
Dockerize and deploy
Write a clear README
Prepare your 5-minute demo

Next week in Week 16: AI Engineering Principles, we will wrap up the course with best practices, production patterns, career guidance, and a comprehensive resource list.