Stage 3: PDF Export

Goal

Generate professional PDF exports of analysis reports for sharing with stakeholders.

Key Design Decisions

  1. Mirror web view: PDF includes all sections displayed in the web report
  2. Text only: No chart images - keeps implementation simpler and PDFs lighter
  3. Server-side generation: PDF generated on backend, downloaded by client
  4. Styled template: Professional formatting with headers, sections, and branding

Scope

In Scope

  • Export single-file AnalysisRun to PDF
  • Export MultiAnalysis to PDF
  • Include: executive summary, themes, insights, custom fields
  • Basic branding (logo, colors)
  • Table of contents for longer reports

Out of Scope

  • Chart/visualization rendering
  • Custom PDF templates per organization
  • Batch export (multiple reports at once)
  • Scheduled/automated exports

Technical Approach

  • Python library, pure HTML/CSS to PDF
  • Good for styled documents
  • No external dependencies (headless browser)
  • Works well in Lambda with layer

Option B: ReportLab

  • Lower-level, programmatic PDF generation
  • More control but more code
  • Better for complex layouts

Option C: Puppeteer/Playwright

  • Render actual React page and print to PDF
  • Most accurate to web view
  • Requires headless Chrome (heavy for Lambda)

Recommendation: WeasyPrint for simplicity and Lambda compatibility.

API Changes

New Endpoints

GET /analyses/{id}/export?format=pdf
    - Generates PDF for a single-file analysis
    - Returns: PDF file download

GET /analyze/multi/{id}/export?format=pdf
    - Generates PDF for a multi-analysis
    - Returns: PDF file download

Response Headers

Content-Type: application/pdf
Content-Disposition: attachment; filename="Analysis_Report_{id}.pdf"

PDF Structure

Single Analysis (AnalysisRun)

┌─────────────────────────────────────────┐
│ [Logo]              Analysis Report     │
│                                         │
│ File: interview_001.mp3                 │
│ Duration: 45 min                        │
│ Generated: Jan 15, 2026                 │
├─────────────────────────────────────────┤
│ Table of Contents                       │
│ 1. Executive Summary ............... 2  │
│ 2. Key Findings ................... 3   │
│ 3. Speaker Identification ......... 4   │
│ 4. Topics & Themes ................ 5   │
│ ...                                     │
├─────────────────────────────────────────┤
│ 1. EXECUTIVE SUMMARY                    │
│                                         │
│ [executive_summary text]                │
│                                         │
├─────────────────────────────────────────┤
│ 2. KEY FINDINGS                         │
│                                         │
│ • Finding 1                             │
│ • Finding 2                             │
│ ...                                     │
├─────────────────────────────────────────┤
│ [Additional sections based on data]     │
└─────────────────────────────────────────┘

Multi-Analysis

┌─────────────────────────────────────────┐
│ [Logo]         Multi-File Analysis      │
│                                         │
│ Name: Q1 Interview Synthesis            │
│ Files: 12 interviews                    │
│ Total Duration: 8h 32m                  │
│ Generated: Jan 15, 2026                 │
├─────────────────────────────────────────┤
│ Table of Contents                       │
│ 1. Executive Synthesis ............ 2   │
│ 2. Convergent Themes .............. 3   │
│ 3. Divergent Themes ............... 5   │
│ 4. Recommendations ................ 7   │
│ 5. Custom Sections ................ 8   │
│ ...                                     │
├─────────────────────────────────────────┤
│ [Sections rendered as in web view]      │
└─────────────────────────────────────────┘

Implementation

HTML Template (Jinja2)

<!DOCTYPE html>
<html>
<head>
  <style>
    @page {
      size: letter;
      margin: 1in;
      @top-right {
        content: "Page " counter(page);
      }
    }

    body {
      font-family: 'Inter', sans-serif;
      font-size: 11pt;
      line-height: 1.5;
      color: #1a1a1a;
    }

    .header {
      display: flex;
      justify-content: space-between;
      border-bottom: 2px solid #2563eb;
      padding-bottom: 1rem;
      margin-bottom: 2rem;
    }

    .logo { height: 40px; }

    h1 {
      font-size: 24pt;
      color: #1e40af;
      margin-top: 0;
    }

    h2 {
      font-size: 14pt;
      color: #1e40af;
      border-bottom: 1px solid #e5e7eb;
      padding-bottom: 0.5rem;
      margin-top: 2rem;
      page-break-after: avoid;
    }

    .metadata {
      background: #f8fafc;
      padding: 1rem;
      border-radius: 8px;
      margin-bottom: 2rem;
    }

    .metadata dt { font-weight: 600; }
    .metadata dd { margin-left: 0; margin-bottom: 0.5rem; }

    .toc {
      background: #f8fafc;
      padding: 1.5rem;
      margin-bottom: 2rem;
    }

    .toc-item {
      display: flex;
      justify-content: space-between;
      padding: 0.25rem 0;
    }

    .section { page-break-inside: avoid; }

    ul, ol { margin-left: 1.5rem; }

    .theme-card {
      border: 1px solid #e5e7eb;
      border-radius: 8px;
      padding: 1rem;
      margin-bottom: 1rem;
      page-break-inside: avoid;
    }

    .theme-title {
      font-weight: 600;
      color: #1e40af;
    }

    .badge {
      display: inline-block;
      background: #e5e7eb;
      padding: 0.125rem 0.5rem;
      border-radius: 4px;
      font-size: 9pt;
      margin-right: 0.5rem;
    }

    .custom-section {
      margin-top: 2rem;
      padding: 1rem;
      background: #fafafa;
      border-left: 4px solid #2563eb;
    }
  </style>
</head>
<body>
  <div class="header">
    <img src="{{ logo_url }}" class="logo" alt="Logo">
    <div>
      <h1>{{ report_title }}</h1>
    </div>
  </div>

  <dl class="metadata">
    {% if analysis_type == 'multi' %}
    <dt>Analysis Name</dt>
    <dd>{{ analysis.name }}</dd>
    <dt>Files Included</dt>
    <dd>{{ analysis.file_count }} interviews</dd>
    <dt>Total Duration</dt>
    <dd>{{ format_duration(analysis.total_duration_seconds) }}</dd>
    {% else %}
    <dt>File</dt>
    <dd>{{ file.original_filename }}</dd>
    <dt>Duration</dt>
    <dd>{{ format_duration(file.duration_seconds) }}</dd>
    {% endif %}
    <dt>Generated</dt>
    <dd>{{ now.strftime('%B %d, %Y') }}</dd>
  </dl>

  <!-- Table of Contents -->
  <div class="toc">
    <h2 style="margin-top: 0;">Contents</h2>
    {% for section in sections %}
    <div class="toc-item">
      <span>{{ loop.index }}. {{ section.title }}</span>
    </div>
    {% endfor %}
  </div>

  <!-- Sections -->
  {% for section in sections %}
  <div class="section">
    <h2>{{ loop.index }}. {{ section.title }}</h2>
    {{ section.content | safe }}
  </div>
  {% endfor %}

</body>
</html>

Backend Service

# api/src/services/pdf_export.py

from weasyprint import HTML, CSS
from jinja2 import Environment, FileSystemLoader
from io import BytesIO

class PDFExportService:
    def __init__(self):
        self.env = Environment(loader=FileSystemLoader('templates/pdf'))
        self.template = self.env.get_template('report.html')

    def export_analysis(self, analysis: AnalysisRun, file: File) -> bytes:
        sections = self._build_analysis_sections(analysis)
        html = self.template.render(
            analysis_type='single',
            analysis=analysis,
            file=file,
            sections=sections,
            report_title='Analysis Report',
            logo_url=self._get_logo_url(),
            now=datetime.utcnow(),
            format_duration=format_duration
        )
        return self._render_pdf(html)

    def export_multi_analysis(self, analysis: MultiAnalysis) -> bytes:
        sections = self._build_multi_sections(analysis)
        html = self.template.render(
            analysis_type='multi',
            analysis=analysis,
            sections=sections,
            report_title='Multi-File Analysis',
            logo_url=self._get_logo_url(),
            now=datetime.utcnow(),
            format_duration=format_duration
        )
        return self._render_pdf(html)

    def _render_pdf(self, html: str) -> bytes:
        buffer = BytesIO()
        HTML(string=html).write_pdf(buffer)
        return buffer.getvalue()

    def _build_multi_sections(self, analysis: MultiAnalysis) -> list:
        sections = []

        if analysis.executive_synthesis:
            sections.append({
                'title': 'Executive Synthesis',
                'content': f'<p>{analysis.executive_synthesis}</p>'
            })

        if analysis.convergent_themes:
            sections.append({
                'title': 'Convergent Themes',
                'content': self._render_themes(analysis.convergent_themes)
            })

        if analysis.divergent_themes:
            sections.append({
                'title': 'Divergent Themes',
                'content': self._render_themes(analysis.divergent_themes)
            })

        # ... other standard sections ...

        # Custom fields
        if analysis.custom_fields:
            for key, value in analysis.custom_fields.items():
                sections.append({
                    'title': format_key_name(key),
                    'content': self._render_custom_field(value)
                })

        return sections

    def _render_themes(self, themes: list) -> str:
        html = ''
        for theme in themes:
            html += f'''
            <div class="theme-card">
                <div class="theme-title">{theme.get('theme', 'Unnamed')}</div>
                <p>{theme.get('description', '')}</p>
                {self._render_evidence(theme.get('evidence', []))}
            </div>
            '''
        return html

    def _render_custom_field(self, value) -> str:
        if isinstance(value, str):
            return f'<div class="custom-section"><p>{value}</p></div>'
        if isinstance(value, list):
            return self._render_custom_array(value)
        if isinstance(value, dict):
            return self._render_custom_object(value)
        return f'<pre>{json.dumps(value, indent=2)}</pre>'

Endpoint Implementation

# api/src/routers/analyses.py

from fastapi.responses import StreamingResponse
from src.services.pdf_export import PDFExportService

pdf_service = PDFExportService()

@router.get("/{analysis_id}/export")
async def export_analysis(
    analysis_id: UUID,
    format: str = Query("pdf"),
    auth: AuthContext = Depends(get_auth),
    db: Session = Depends(get_db)
):
    if format != "pdf":
        raise HTTPException(400, "Only PDF format supported")

    analysis = db.query(AnalysisRun).filter(
        AnalysisRun.id == analysis_id
    ).first()

    if not analysis:
        raise HTTPException(404, "Analysis not found")

    # Check access
    check_analysis_access(auth, analysis)

    file = db.query(File).filter(File.id == analysis.file_id).first()

    pdf_bytes = pdf_service.export_analysis(analysis, file)

    return StreamingResponse(
        BytesIO(pdf_bytes),
        media_type="application/pdf",
        headers={
            "Content-Disposition": f'attachment; filename="Analysis_{analysis_id}.pdf"'
        }
    )

Frontend Changes

Export Button

Add to AnalysisDetails.jsx and MultiAnalysisDetails.jsx:

<Button
  variant="outline"
  onClick={handleExportPDF}
  disabled={exporting}
>
  {exporting ? (
    <Loader2 className="h-4 w-4 mr-2 animate-spin" />
  ) : (
    <FileDown className="h-4 w-4 mr-2" />
  )}
  Export PDF
</Button>
const handleExportPDF = async () => {
  setExporting(true);
  try {
    const response = await fetch(
      `${API_URL}/analyses/${analysis.id}/export?format=pdf`,
      {
        headers: { 'X-API-Key': apiKey }
      }
    );

    if (!response.ok) throw new Error('Export failed');

    const blob = await response.blob();
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = `Analysis_${analysis.id}.pdf`;
    a.click();
    URL.revokeObjectURL(url);

    toast.success('PDF downloaded');
  } catch (err) {
    toast.error('Failed to export PDF');
  } finally {
    setExporting(false);
  }
};

Lambda Considerations

WeasyPrint Layer

WeasyPrint requires system libraries. Options:

  1. Lambda Layer: Pre-built layer with WeasyPrint + dependencies

  2. Container Lambda: Use Docker image with WeasyPrint installed

  3. External Service: Call a separate PDF generation service (Modal, dedicated EC2)

Recommendation: Start with Lambda Layer for simplicity. If it causes issues, move to a dedicated Modal function.

Files to Create/Modify

Backend

  • api/src/services/pdf_export.py - New PDF generation service
  • api/src/templates/pdf/report.html - Jinja2 template
  • api/src/templates/pdf/styles.css - PDF styles
  • api/src/routers/analyses.py - Add export endpoints
  • api/requirements.txt - Add weasyprint

Frontend

  • client/src/pages/AnalysisDetails.jsx - Add export button
  • client/src/pages/MultiAnalysisDetails.jsx - Add export button

Demo Criteria

  1. User opens a completed MultiAnalysis report
  2. Clicks “Export PDF” button
  3. Button shows loading state
  4. PDF downloads automatically
  5. PDF opens showing:
    • Header with branding and report title
    • Metadata (files included, duration, date)
    • Table of contents
    • All sections from the web view (executive synthesis, themes, custom fields, etc.)
    • Professional formatting with consistent styles
  6. Same flow works for single-file AnalysisRun