Stage 3: PDF Export
Goal
Generate professional PDF exports of analysis reports for sharing with stakeholders.
Key Design Decisions
- Mirror web view: PDF includes all sections displayed in the web report
- Text only: No chart images - keeps implementation simpler and PDFs lighter
- Server-side generation: PDF generated on backend, downloaded by client
- Styled template: Professional formatting with headers, sections, and branding
Scope
In Scope
- Export single-file AnalysisRun to PDF
- Export MultiAnalysis to PDF
- Include: executive summary, themes, insights, custom fields
- Basic branding (logo, colors)
- Table of contents for longer reports
Out of Scope
- Chart/visualization rendering
- Custom PDF templates per organization
- Batch export (multiple reports at once)
- Scheduled/automated exports
Technical Approach
Option A: WeasyPrint (Recommended)
- Python library, pure HTML/CSS to PDF
- Good for styled documents
- No external dependencies (headless browser)
- Works well in Lambda with layer
Option B: ReportLab
- Lower-level, programmatic PDF generation
- More control but more code
- Better for complex layouts
Option C: Puppeteer/Playwright
- Render actual React page and print to PDF
- Most accurate to web view
- Requires headless Chrome (heavy for Lambda)
Recommendation: WeasyPrint for simplicity and Lambda compatibility.
API Changes
New Endpoints
GET /analyses/{id}/export?format=pdf
- Generates PDF for a single-file analysis
- Returns: PDF file download
GET /analyze/multi/{id}/export?format=pdf
- Generates PDF for a multi-analysis
- Returns: PDF file download
Response Headers
Content-Type: application/pdf
Content-Disposition: attachment; filename="Analysis_Report_{id}.pdf"
PDF Structure
Single Analysis (AnalysisRun)
┌─────────────────────────────────────────┐
│ [Logo] Analysis Report │
│ │
│ File: interview_001.mp3 │
│ Duration: 45 min │
│ Generated: Jan 15, 2026 │
├─────────────────────────────────────────┤
│ Table of Contents │
│ 1. Executive Summary ............... 2 │
│ 2. Key Findings ................... 3 │
│ 3. Speaker Identification ......... 4 │
│ 4. Topics & Themes ................ 5 │
│ ... │
├─────────────────────────────────────────┤
│ 1. EXECUTIVE SUMMARY │
│ │
│ [executive_summary text] │
│ │
├─────────────────────────────────────────┤
│ 2. KEY FINDINGS │
│ │
│ • Finding 1 │
│ • Finding 2 │
│ ... │
├─────────────────────────────────────────┤
│ [Additional sections based on data] │
└─────────────────────────────────────────┘
Multi-Analysis
┌─────────────────────────────────────────┐
│ [Logo] Multi-File Analysis │
│ │
│ Name: Q1 Interview Synthesis │
│ Files: 12 interviews │
│ Total Duration: 8h 32m │
│ Generated: Jan 15, 2026 │
├─────────────────────────────────────────┤
│ Table of Contents │
│ 1. Executive Synthesis ............ 2 │
│ 2. Convergent Themes .............. 3 │
│ 3. Divergent Themes ............... 5 │
│ 4. Recommendations ................ 7 │
│ 5. Custom Sections ................ 8 │
│ ... │
├─────────────────────────────────────────┤
│ [Sections rendered as in web view] │
└─────────────────────────────────────────┘
Implementation
HTML Template (Jinja2)
<!DOCTYPE html>
<html>
<head>
<style>
@page {
size: letter;
margin: 1in;
@top-right {
content: "Page " counter(page);
}
}
body {
font-family: 'Inter', sans-serif;
font-size: 11pt;
line-height: 1.5;
color: #1a1a1a;
}
.header {
display: flex;
justify-content: space-between;
border-bottom: 2px solid #2563eb;
padding-bottom: 1rem;
margin-bottom: 2rem;
}
.logo { height: 40px; }
h1 {
font-size: 24pt;
color: #1e40af;
margin-top: 0;
}
h2 {
font-size: 14pt;
color: #1e40af;
border-bottom: 1px solid #e5e7eb;
padding-bottom: 0.5rem;
margin-top: 2rem;
page-break-after: avoid;
}
.metadata {
background: #f8fafc;
padding: 1rem;
border-radius: 8px;
margin-bottom: 2rem;
}
.metadata dt { font-weight: 600; }
.metadata dd { margin-left: 0; margin-bottom: 0.5rem; }
.toc {
background: #f8fafc;
padding: 1.5rem;
margin-bottom: 2rem;
}
.toc-item {
display: flex;
justify-content: space-between;
padding: 0.25rem 0;
}
.section { page-break-inside: avoid; }
ul, ol { margin-left: 1.5rem; }
.theme-card {
border: 1px solid #e5e7eb;
border-radius: 8px;
padding: 1rem;
margin-bottom: 1rem;
page-break-inside: avoid;
}
.theme-title {
font-weight: 600;
color: #1e40af;
}
.badge {
display: inline-block;
background: #e5e7eb;
padding: 0.125rem 0.5rem;
border-radius: 4px;
font-size: 9pt;
margin-right: 0.5rem;
}
.custom-section {
margin-top: 2rem;
padding: 1rem;
background: #fafafa;
border-left: 4px solid #2563eb;
}
</style>
</head>
<body>
<div class="header">
<img src="{{ logo_url }}" class="logo" alt="Logo">
<div>
<h1>{{ report_title }}</h1>
</div>
</div>
<dl class="metadata">
{% if analysis_type == 'multi' %}
<dt>Analysis Name</dt>
<dd>{{ analysis.name }}</dd>
<dt>Files Included</dt>
<dd>{{ analysis.file_count }} interviews</dd>
<dt>Total Duration</dt>
<dd>{{ format_duration(analysis.total_duration_seconds) }}</dd>
{% else %}
<dt>File</dt>
<dd>{{ file.original_filename }}</dd>
<dt>Duration</dt>
<dd>{{ format_duration(file.duration_seconds) }}</dd>
{% endif %}
<dt>Generated</dt>
<dd>{{ now.strftime('%B %d, %Y') }}</dd>
</dl>
<!-- Table of Contents -->
<div class="toc">
<h2 style="margin-top: 0;">Contents</h2>
{% for section in sections %}
<div class="toc-item">
<span>{{ loop.index }}. {{ section.title }}</span>
</div>
{% endfor %}
</div>
<!-- Sections -->
{% for section in sections %}
<div class="section">
<h2>{{ loop.index }}. {{ section.title }}</h2>
{{ section.content | safe }}
</div>
{% endfor %}
</body>
</html>
Backend Service
# api/src/services/pdf_export.py
from weasyprint import HTML, CSS
from jinja2 import Environment, FileSystemLoader
from io import BytesIO
class PDFExportService:
def __init__(self):
self.env = Environment(loader=FileSystemLoader('templates/pdf'))
self.template = self.env.get_template('report.html')
def export_analysis(self, analysis: AnalysisRun, file: File) -> bytes:
sections = self._build_analysis_sections(analysis)
html = self.template.render(
analysis_type='single',
analysis=analysis,
file=file,
sections=sections,
report_title='Analysis Report',
logo_url=self._get_logo_url(),
now=datetime.utcnow(),
format_duration=format_duration
)
return self._render_pdf(html)
def export_multi_analysis(self, analysis: MultiAnalysis) -> bytes:
sections = self._build_multi_sections(analysis)
html = self.template.render(
analysis_type='multi',
analysis=analysis,
sections=sections,
report_title='Multi-File Analysis',
logo_url=self._get_logo_url(),
now=datetime.utcnow(),
format_duration=format_duration
)
return self._render_pdf(html)
def _render_pdf(self, html: str) -> bytes:
buffer = BytesIO()
HTML(string=html).write_pdf(buffer)
return buffer.getvalue()
def _build_multi_sections(self, analysis: MultiAnalysis) -> list:
sections = []
if analysis.executive_synthesis:
sections.append({
'title': 'Executive Synthesis',
'content': f'<p>{analysis.executive_synthesis}</p>'
})
if analysis.convergent_themes:
sections.append({
'title': 'Convergent Themes',
'content': self._render_themes(analysis.convergent_themes)
})
if analysis.divergent_themes:
sections.append({
'title': 'Divergent Themes',
'content': self._render_themes(analysis.divergent_themes)
})
# ... other standard sections ...
# Custom fields
if analysis.custom_fields:
for key, value in analysis.custom_fields.items():
sections.append({
'title': format_key_name(key),
'content': self._render_custom_field(value)
})
return sections
def _render_themes(self, themes: list) -> str:
html = ''
for theme in themes:
html += f'''
<div class="theme-card">
<div class="theme-title">{theme.get('theme', 'Unnamed')}</div>
<p>{theme.get('description', '')}</p>
{self._render_evidence(theme.get('evidence', []))}
</div>
'''
return html
def _render_custom_field(self, value) -> str:
if isinstance(value, str):
return f'<div class="custom-section"><p>{value}</p></div>'
if isinstance(value, list):
return self._render_custom_array(value)
if isinstance(value, dict):
return self._render_custom_object(value)
return f'<pre>{json.dumps(value, indent=2)}</pre>'
Endpoint Implementation
# api/src/routers/analyses.py
from fastapi.responses import StreamingResponse
from src.services.pdf_export import PDFExportService
pdf_service = PDFExportService()
@router.get("/{analysis_id}/export")
async def export_analysis(
analysis_id: UUID,
format: str = Query("pdf"),
auth: AuthContext = Depends(get_auth),
db: Session = Depends(get_db)
):
if format != "pdf":
raise HTTPException(400, "Only PDF format supported")
analysis = db.query(AnalysisRun).filter(
AnalysisRun.id == analysis_id
).first()
if not analysis:
raise HTTPException(404, "Analysis not found")
# Check access
check_analysis_access(auth, analysis)
file = db.query(File).filter(File.id == analysis.file_id).first()
pdf_bytes = pdf_service.export_analysis(analysis, file)
return StreamingResponse(
BytesIO(pdf_bytes),
media_type="application/pdf",
headers={
"Content-Disposition": f'attachment; filename="Analysis_{analysis_id}.pdf"'
}
)
Frontend Changes
Export Button
Add to AnalysisDetails.jsx and MultiAnalysisDetails.jsx:
<Button
variant="outline"
onClick={handleExportPDF}
disabled={exporting}
>
{exporting ? (
<Loader2 className="h-4 w-4 mr-2 animate-spin" />
) : (
<FileDown className="h-4 w-4 mr-2" />
)}
Export PDF
</Button>
const handleExportPDF = async () => {
setExporting(true);
try {
const response = await fetch(
`${API_URL}/analyses/${analysis.id}/export?format=pdf`,
{
headers: { 'X-API-Key': apiKey }
}
);
if (!response.ok) throw new Error('Export failed');
const blob = await response.blob();
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `Analysis_${analysis.id}.pdf`;
a.click();
URL.revokeObjectURL(url);
toast.success('PDF downloaded');
} catch (err) {
toast.error('Failed to export PDF');
} finally {
setExporting(false);
}
};
Lambda Considerations
WeasyPrint Layer
WeasyPrint requires system libraries. Options:
-
Lambda Layer: Pre-built layer with WeasyPrint + dependencies
-
Container Lambda: Use Docker image with WeasyPrint installed
-
External Service: Call a separate PDF generation service (Modal, dedicated EC2)
Recommendation: Start with Lambda Layer for simplicity. If it causes issues, move to a dedicated Modal function.
Files to Create/Modify
Backend
api/src/services/pdf_export.py- New PDF generation serviceapi/src/templates/pdf/report.html- Jinja2 templateapi/src/templates/pdf/styles.css- PDF stylesapi/src/routers/analyses.py- Add export endpointsapi/requirements.txt- Add weasyprint
Frontend
client/src/pages/AnalysisDetails.jsx- Add export buttonclient/src/pages/MultiAnalysisDetails.jsx- Add export button
Demo Criteria
- User opens a completed MultiAnalysis report
- Clicks “Export PDF” button
- Button shows loading state
- PDF downloads automatically
- PDF opens showing:
- Header with branding and report title
- Metadata (files included, duration, date)
- Table of contents
- All sections from the web view (executive synthesis, themes, custom fields, etc.)
- Professional formatting with consistent styles
- Same flow works for single-file AnalysisRun