Stage 2: Reference Attachments
Goal
Allow attaching PDF/text files to prompt templates as context for LLM analysis. When an analysis runs with that template, attachments are automatically included.
Key Design Decisions
- Template-level only: Attachments belong to prompt templates, not individual analyses
- Always included: When a template is used, all its attachments are sent to the LLM
- Direct to LLM: Files sent as-is to OpenAI (which supports PDF natively via vision)
- File types: Text (.txt, .md) and PDF (.pdf)
Use Case
A nonprofit has a “Loan Interview Analysis” template. They attach “Loan_Application_Checklist.pdf” to it. Now every interview analyzed with this template automatically gets compared against the checklist requirements.
Data Model Changes
New Table: template_attachments
CREATE TABLE template_attachments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
prompt_template_id UUID NOT NULL REFERENCES prompt_templates(id) ON DELETE CASCADE,
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
-- File info
filename VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL, -- 'application/pdf', 'text/plain', 'text/markdown'
file_size_bytes INTEGER NOT NULL,
s3_key VARCHAR(500) NOT NULL, -- attachments/{org_id}/{template_id}/{uuid}/{filename}
-- Metadata
description TEXT, -- Optional user description
created_by_user_id UUID REFERENCES users(id),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT valid_content_type CHECK (
content_type IN ('application/pdf', 'text/plain', 'text/markdown')
)
);
CREATE INDEX idx_template_attachments_template ON template_attachments(prompt_template_id);
SQLAlchemy Model
class TemplateAttachment(Base):
__tablename__ = "template_attachments"
id = Column(UUID, primary_key=True, default=uuid.uuid4)
prompt_template_id = Column(UUID, ForeignKey("prompt_templates.id", ondelete="CASCADE"), nullable=False)
organization_id = Column(UUID, ForeignKey("organizations.id", ondelete="CASCADE"), nullable=False)
filename = Column(String(255), nullable=False)
content_type = Column(String(100), nullable=False)
file_size_bytes = Column(Integer, nullable=False)
s3_key = Column(String(500), nullable=False)
description = Column(Text)
created_by_user_id = Column(UUID, ForeignKey("users.id"))
created_at = Column(DateTime, default=datetime.utcnow)
# Relationships
prompt_template = relationship("PromptTemplate", back_populates="attachments")
created_by = relationship("User")
Add to PromptTemplate:
class PromptTemplate(Base):
# ... existing fields ...
attachments = relationship("TemplateAttachment", back_populates="prompt_template", cascade="all, delete-orphan")
API Changes
New Endpoints
POST /prompts/{id}/attachments
- Multipart form upload
- Fields: file (binary), description (optional)
- Returns: attachment object with presigned download URL
- Requires: admin role + template ownership
GET /prompts/{id}/attachments
- Lists all attachments for a template
- Returns: array of attachment objects with download URLs
DELETE /prompts/{id}/attachments/{attachment_id}
- Removes attachment (deletes from S3 too)
- Requires: admin role
GET /prompts/{id}/attachments/{attachment_id}/download
- Returns presigned S3 URL for download
Attachment Response Schema
class AttachmentResponse(BaseModel):
id: UUID
filename: str
content_type: str
file_size_bytes: int
description: Optional[str]
download_url: str # Presigned S3 URL
created_at: datetime
created_by: Optional[UserBrief]
Modified: Analysis Execution
When running an analysis, fetch template attachments and include in LLM call.
For OpenAI with PDF support (gpt-4o, gpt-4-turbo):
async def run_analysis_with_attachments(transcript, template):
messages = []
# System message
messages.append({
"role": "system",
"content": template.system_message
})
# Add attachments as user messages with file content
for attachment in template.attachments:
file_content = download_from_s3(attachment.s3_key)
if attachment.content_type == "application/pdf":
# Send PDF as base64 image (OpenAI vision handles PDFs)
base64_content = base64.b64encode(file_content).decode()
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": f"Reference document: {attachment.filename}"
},
{
"type": "image_url",
"image_url": {
"url": f"data:application/pdf;base64,{base64_content}"
}
}
]
})
else:
# Text files sent as text
text_content = file_content.decode('utf-8')
messages.append({
"role": "user",
"content": f"Reference document ({attachment.filename}):\n\n{text_content}"
})
# Add the transcript and user prompt
messages.append({
"role": "user",
"content": render_prompt(template.user_prompt, transcript=transcript)
})
return await openai_client.chat.completions.create(
model=template.model or "gpt-4o",
messages=messages,
response_format=template.response_format
)
Note on PDF handling:
- OpenAI’s vision models can process PDFs sent as base64
- For large PDFs, consider extracting text first (PyPDF2/pdfplumber)
- Current scope excludes 50+ page PDFs
Frontend Changes
Prompt Template Editor
Add “Attachments” section to PromptTemplateEditor.jsx:
<Card>
<CardHeader>
<CardTitle>Reference Attachments</CardTitle>
<CardDescription>
Files attached here will be included as context when this template is used for analysis.
</CardDescription>
</CardHeader>
<CardContent>
{/* List existing attachments */}
<div className="space-y-2">
{attachments.map(att => (
<div key={att.id} className="flex items-center justify-between p-3 border rounded">
<div className="flex items-center gap-3">
<FileIcon type={att.content_type} />
<div>
<p className="font-medium">{att.filename}</p>
<p className="text-sm text-muted-foreground">
{formatBytes(att.file_size_bytes)}
{att.description && ` • ${att.description}`}
</p>
</div>
</div>
<div className="flex gap-2">
<Button variant="ghost" size="sm" onClick={() => downloadAttachment(att)}>
<Download className="h-4 w-4" />
</Button>
<Button variant="ghost" size="sm" onClick={() => deleteAttachment(att.id)}>
<Trash2 className="h-4 w-4" />
</Button>
</div>
</div>
))}
</div>
{/* Upload new */}
<div className="mt-4">
<input
type="file"
accept=".pdf,.txt,.md"
onChange={handleFileUpload}
className="hidden"
ref={fileInputRef}
/>
<Button variant="outline" onClick={() => fileInputRef.current?.click()}>
<Paperclip className="h-4 w-4 mr-2" />
Attach File
</Button>
<p className="text-xs text-muted-foreground mt-2">
Supported: PDF, TXT, MD (max 10MB)
</p>
</div>
</CardContent>
</Card>
Upload Flow
const handleFileUpload = async (e) => {
const file = e.target.files[0];
if (!file) return;
// Validate
const validTypes = ['application/pdf', 'text/plain', 'text/markdown'];
if (!validTypes.includes(file.type)) {
toast.error('Invalid file type. Please upload PDF or text files.');
return;
}
if (file.size > 10 * 1024 * 1024) {
toast.error('File too large. Maximum size is 10MB.');
return;
}
// Upload
const formData = new FormData();
formData.append('file', file);
const response = await api.post(`/prompts/${templateId}/attachments`, formData, {
headers: { 'Content-Type': 'multipart/form-data' }
});
setAttachments([...attachments, response.data]);
toast.success('Attachment uploaded');
};
S3 Storage Structure
s3://finca-audio-{stage}/
├── audio/
│ └── {org_id}/
│ └── {file_id}/
│ └── original.{ext}
├── transcripts/
│ └── ...
└── attachments/
└── {org_id}/
└── {template_id}/
└── {attachment_id}/
└── {original_filename}
Size Limits
| Constraint | Value |
|---|---|
| Max file size | 10 MB |
| Max attachments per template | 5 |
| Allowed types | PDF, TXT, MD |
| Max total attachment size per template | 25 MB |
Files to Create/Modify
Backend
api/src/shared/db_models.py- Add TemplateAttachment modelapi/migrations/versions/xxx_add_template_attachments.py- Migrationapi/src/routers/prompts.py- Add attachment endpointsapi/src/shared/storage.py- Add attachment upload/download helpersapi/src/handlers/analysis.py- Include attachments in LLM call
Frontend
client/src/pages/PromptTemplateEditor.jsx- Add attachments sectionclient/src/lib/api.js- Add multipart upload support (if not present)
Demo Criteria
- Admin opens a prompt template in the editor
- Scrolls to “Reference Attachments” section
- Clicks “Attach File”, selects “Loan_Checklist.pdf”
- File uploads, appears in list with filename and size
- Admin saves template
- Admin runs an analysis on an interview using this template
- Analysis result includes insights that reference the checklist content
- Admin can download or delete the attachment from the template editor
Future Considerations (Out of Scope)
- Large PDF support (50+ pages) - would need chunking or summarization
- Per-analysis attachments - currently template-only
- Attachment versioning - currently replace/delete only
- OCR for scanned PDFs - rely on OpenAI’s native handling