Stage 2: Reference Attachments

Goal

Allow attaching PDF/text files to prompt templates as context for LLM analysis. When an analysis runs with that template, attachments are automatically included.

Key Design Decisions

  1. Template-level only: Attachments belong to prompt templates, not individual analyses
  2. Always included: When a template is used, all its attachments are sent to the LLM
  3. Direct to LLM: Files sent as-is to OpenAI (which supports PDF natively via vision)
  4. File types: Text (.txt, .md) and PDF (.pdf)

Use Case

A nonprofit has a “Loan Interview Analysis” template. They attach “Loan_Application_Checklist.pdf” to it. Now every interview analyzed with this template automatically gets compared against the checklist requirements.

Data Model Changes

New Table: template_attachments

CREATE TABLE template_attachments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    prompt_template_id UUID NOT NULL REFERENCES prompt_templates(id) ON DELETE CASCADE,
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,

    -- File info
    filename VARCHAR(255) NOT NULL,
    content_type VARCHAR(100) NOT NULL,  -- 'application/pdf', 'text/plain', 'text/markdown'
    file_size_bytes INTEGER NOT NULL,
    s3_key VARCHAR(500) NOT NULL,         -- attachments/{org_id}/{template_id}/{uuid}/{filename}

    -- Metadata
    description TEXT,                      -- Optional user description
    created_by_user_id UUID REFERENCES users(id),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT valid_content_type CHECK (
        content_type IN ('application/pdf', 'text/plain', 'text/markdown')
    )
);

CREATE INDEX idx_template_attachments_template ON template_attachments(prompt_template_id);

SQLAlchemy Model

class TemplateAttachment(Base):
    __tablename__ = "template_attachments"

    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    prompt_template_id = Column(UUID, ForeignKey("prompt_templates.id", ondelete="CASCADE"), nullable=False)
    organization_id = Column(UUID, ForeignKey("organizations.id", ondelete="CASCADE"), nullable=False)

    filename = Column(String(255), nullable=False)
    content_type = Column(String(100), nullable=False)
    file_size_bytes = Column(Integer, nullable=False)
    s3_key = Column(String(500), nullable=False)

    description = Column(Text)
    created_by_user_id = Column(UUID, ForeignKey("users.id"))
    created_at = Column(DateTime, default=datetime.utcnow)

    # Relationships
    prompt_template = relationship("PromptTemplate", back_populates="attachments")
    created_by = relationship("User")

Add to PromptTemplate:

class PromptTemplate(Base):
    # ... existing fields ...
    attachments = relationship("TemplateAttachment", back_populates="prompt_template", cascade="all, delete-orphan")

API Changes

New Endpoints

POST   /prompts/{id}/attachments
       - Multipart form upload
       - Fields: file (binary), description (optional)
       - Returns: attachment object with presigned download URL
       - Requires: admin role + template ownership

GET    /prompts/{id}/attachments
       - Lists all attachments for a template
       - Returns: array of attachment objects with download URLs

DELETE /prompts/{id}/attachments/{attachment_id}
       - Removes attachment (deletes from S3 too)
       - Requires: admin role

GET    /prompts/{id}/attachments/{attachment_id}/download
       - Returns presigned S3 URL for download

Attachment Response Schema

class AttachmentResponse(BaseModel):
    id: UUID
    filename: str
    content_type: str
    file_size_bytes: int
    description: Optional[str]
    download_url: str  # Presigned S3 URL
    created_at: datetime
    created_by: Optional[UserBrief]

Modified: Analysis Execution

When running an analysis, fetch template attachments and include in LLM call.

For OpenAI with PDF support (gpt-4o, gpt-4-turbo):

async def run_analysis_with_attachments(transcript, template):
    messages = []

    # System message
    messages.append({
        "role": "system",
        "content": template.system_message
    })

    # Add attachments as user messages with file content
    for attachment in template.attachments:
        file_content = download_from_s3(attachment.s3_key)

        if attachment.content_type == "application/pdf":
            # Send PDF as base64 image (OpenAI vision handles PDFs)
            base64_content = base64.b64encode(file_content).decode()
            messages.append({
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"Reference document: {attachment.filename}"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:application/pdf;base64,{base64_content}"
                        }
                    }
                ]
            })
        else:
            # Text files sent as text
            text_content = file_content.decode('utf-8')
            messages.append({
                "role": "user",
                "content": f"Reference document ({attachment.filename}):\n\n{text_content}"
            })

    # Add the transcript and user prompt
    messages.append({
        "role": "user",
        "content": render_prompt(template.user_prompt, transcript=transcript)
    })

    return await openai_client.chat.completions.create(
        model=template.model or "gpt-4o",
        messages=messages,
        response_format=template.response_format
    )

Note on PDF handling:

  • OpenAI’s vision models can process PDFs sent as base64
  • For large PDFs, consider extracting text first (PyPDF2/pdfplumber)
  • Current scope excludes 50+ page PDFs

Frontend Changes

Prompt Template Editor

Add “Attachments” section to PromptTemplateEditor.jsx:

<Card>
  <CardHeader>
    <CardTitle>Reference Attachments</CardTitle>
    <CardDescription>
      Files attached here will be included as context when this template is used for analysis.
    </CardDescription>
  </CardHeader>
  <CardContent>
    {/* List existing attachments */}
    <div className="space-y-2">
      {attachments.map(att => (
        <div key={att.id} className="flex items-center justify-between p-3 border rounded">
          <div className="flex items-center gap-3">
            <FileIcon type={att.content_type} />
            <div>
              <p className="font-medium">{att.filename}</p>
              <p className="text-sm text-muted-foreground">
                {formatBytes(att.file_size_bytes)}
                {att.description && ` • ${att.description}`}
              </p>
            </div>
          </div>
          <div className="flex gap-2">
            <Button variant="ghost" size="sm" onClick={() => downloadAttachment(att)}>
              <Download className="h-4 w-4" />
            </Button>
            <Button variant="ghost" size="sm" onClick={() => deleteAttachment(att.id)}>
              <Trash2 className="h-4 w-4" />
            </Button>
          </div>
        </div>
      ))}
    </div>

    {/* Upload new */}
    <div className="mt-4">
      <input
        type="file"
        accept=".pdf,.txt,.md"
        onChange={handleFileUpload}
        className="hidden"
        ref={fileInputRef}
      />
      <Button variant="outline" onClick={() => fileInputRef.current?.click()}>
        <Paperclip className="h-4 w-4 mr-2" />
        Attach File
      </Button>
      <p className="text-xs text-muted-foreground mt-2">
        Supported: PDF, TXT, MD (max 10MB)
      </p>
    </div>
  </CardContent>
</Card>

Upload Flow

const handleFileUpload = async (e) => {
  const file = e.target.files[0];
  if (!file) return;

  // Validate
  const validTypes = ['application/pdf', 'text/plain', 'text/markdown'];
  if (!validTypes.includes(file.type)) {
    toast.error('Invalid file type. Please upload PDF or text files.');
    return;
  }

  if (file.size > 10 * 1024 * 1024) {
    toast.error('File too large. Maximum size is 10MB.');
    return;
  }

  // Upload
  const formData = new FormData();
  formData.append('file', file);

  const response = await api.post(`/prompts/${templateId}/attachments`, formData, {
    headers: { 'Content-Type': 'multipart/form-data' }
  });

  setAttachments([...attachments, response.data]);
  toast.success('Attachment uploaded');
};

S3 Storage Structure

s3://finca-audio-{stage}/
├── audio/
│   └── {org_id}/
│       └── {file_id}/
│           └── original.{ext}
├── transcripts/
│   └── ...
└── attachments/
    └── {org_id}/
        └── {template_id}/
            └── {attachment_id}/
                └── {original_filename}

Size Limits

ConstraintValue
Max file size10 MB
Max attachments per template5
Allowed typesPDF, TXT, MD
Max total attachment size per template25 MB

Files to Create/Modify

Backend

  • api/src/shared/db_models.py - Add TemplateAttachment model
  • api/migrations/versions/xxx_add_template_attachments.py - Migration
  • api/src/routers/prompts.py - Add attachment endpoints
  • api/src/shared/storage.py - Add attachment upload/download helpers
  • api/src/handlers/analysis.py - Include attachments in LLM call

Frontend

  • client/src/pages/PromptTemplateEditor.jsx - Add attachments section
  • client/src/lib/api.js - Add multipart upload support (if not present)

Demo Criteria

  1. Admin opens a prompt template in the editor
  2. Scrolls to “Reference Attachments” section
  3. Clicks “Attach File”, selects “Loan_Checklist.pdf”
  4. File uploads, appears in list with filename and size
  5. Admin saves template
  6. Admin runs an analysis on an interview using this template
  7. Analysis result includes insights that reference the checklist content
  8. Admin can download or delete the attachment from the template editor

Future Considerations (Out of Scope)

  • Large PDF support (50+ pages) - would need chunking or summarization
  • Per-analysis attachments - currently template-only
  • Attachment versioning - currently replace/delete only
  • OCR for scanned PDFs - rely on OpenAI’s native handling