Week 1 — Milestone 1: Base Agent Infrastructure

Goal: Build the orchestration engine that drives thesis → plan → investigate → score → action.

Exit Criteria: Orchestration loop runs end-to-end with mocked tool responses. State machine is tested. Thesis and plan generators produce structured output. Persona configs load and constrain behavior.

Tasks

1.1 — Orchestration Loop / State Machine

Assignee: Dev A Effort: 2 days Depends on: —

Description: Build the core state machine that drives the agent loop. States: INIT → THESIS → PLAN → INVESTIGATE → SCORE → ACTION → DONE. Each transition calls the appropriate generator/engine. The loop handles errors (retry current state), max-iteration guards, and early termination.

Subtasks:

Define state enum and transition rules
Implement Orchestrator class with run() method
State persistence (in-memory dict, serializable to JSON)
Error handling: retry on transient failures, abort on fatal
Max iteration guard (configurable, default 10)
Structured logging at each state transition
Unit tests: all valid transitions, error recovery, max iterations

Acceptance Criteria:

Orchestrator.run(persona, context) executes full state machine
Each state transition is logged with timestamp and payload summary
Errors in any state retry up to N times before moving to ERROR state
Unit tests cover: happy path, each error path, max iteration, early termination
All tests pass with uv run pytest tests/unit/test_orchestrator.py

Deliverables:

kosong_agent/orchestration/state_machine.py
kosong_agent/orchestration/orchestrator.py
tests/unit/test_orchestrator.py

1.2 — Trade Thesis Generator

Assignee: Dev A Effort: 1 day Depends on: 1.1

Description: LLM-powered module that takes a persona config + market context and generates a structured trade thesis. Uses kosong step() to call the LLM with a specific system prompt. Output is a Pydantic model: claim, direction (LONG/SHORT/NEUTRAL), assumptions (list), risk factors (list), confidence bracket (LOW/MEDIUM/HIGH).

Subtasks:

Define TradeThesis Pydantic model
Thesis generator prompt (system prompt for LLM)
ThesisGenerator.generate(persona, market_context) -> TradeThesis
Parse LLM JSON output into Pydantic model with validation
Fallback: retry with stricter prompt if parsing fails
Unit tests with mocked LLM responses (valid + malformed)

Acceptance Criteria:

Takes persona config + market context dict → returns TradeThesis
LLM prompt produces parseable JSON consistently
Handles malformed LLM output gracefully (retry once, then raise)
Unit tests cover: valid thesis, malformed response, missing fields
Tests pass with uv run pytest tests/unit/test_thesis.py

Deliverables:

kosong_agent/orchestration/thesis.py
kosong_agent/orchestration/prompts/thesis_prompt.py
tests/unit/test_thesis.py

1.3 — Investigation Plan Generator

Assignee: Dev A Effort: 1 day Depends on: 1.2

Description: LLM-powered module that takes a thesis and maps its assumptions to an ordered checklist of tool calls. Each step has: tool name, parameters, purpose (which assumption it validates), and priority. The plan generator knows which Nava tools are available (from persona config’s tool allowlist).

Subtasks:

Define InvestigationPlan and PlanStep Pydantic models
Plan generator prompt (maps assumptions → tool calls)
PlanGenerator.generate(thesis, available_tools) -> InvestigationPlan
Validate that all referenced tools exist in the toolset
Order steps by dependency (some tools need output from others)
Unit tests with mocked LLM responses

Acceptance Criteria:

Takes TradeThesis + tool list → returns InvestigationPlan with ordered steps
Each step references a valid tool name from the available toolset
Steps include parameter templates (some params filled, some from prior step output)
Unit tests cover: valid plan, unknown tool reference, empty thesis
Tests pass with uv run pytest tests/unit/test_plan.py

Deliverables:

kosong_agent/orchestration/plan.py
kosong_agent/orchestration/prompts/plan_prompt.py
tests/unit/test_plan.py

1.4 — Persona Config Schema

Assignee: Dev A Effort: 0.5 days Depends on: —

Description: Define the Pydantic models for persona configuration. A persona has: name, description, goals, constraints, risk limits, market filters, tool allowlist, scoring factor weights, and action thresholds. Create 3 starter configs as JSON files.

Subtasks:

Define PersonaConfig Pydantic model with all fields
Define nested models: RiskLimits, MarketFilters, ScoringConfig
Create personas/polymarket_sports.json
Create personas/polymarket_multichoice.json
Create personas/hyperliquid_intent.json
Loader function: load_persona(name) -> PersonaConfig
Unit tests: load each persona, validate constraints

Acceptance Criteria:

PersonaConfig model validates all required fields
3 JSON config files parse without errors
load_persona("polymarket_sports") returns validated config
Each persona specifies different tool allowlists and scoring weights
Tests pass with uv run pytest tests/unit/test_persona.py

Deliverables:

kosong_agent/orchestration/persona.py
kosong_agent/personas/polymarket_sports.json
kosong_agent/personas/polymarket_multichoice.json
kosong_agent/personas/hyperliquid_intent.json
tests/unit/test_persona.py

1.5 — Test Fixtures + Mock Data

Assignee: Dev B Effort: 1 day Depends on: —

Description: Create realistic mock responses for each Nava tool so all unit tests can run without API keys. Fixtures include: market discovery results, trade history, price history, orderbook snapshots, wallet history, funding rates.

Subtasks:

Acceptance Criteria:

All fixtures are realistic (based on actual API response shapes)
Fixtures are importable from tests/fixtures/
Mock LLM responses are valid JSON matching expected schemas
No test requires an API key to run in unit mode

Deliverables:

tests/fixtures/polymarket_mocks.py
tests/fixtures/hyperliquid_mocks.py
tests/fixtures/llm_mocks.py
tests/conftest.py

1.6 — Wire Orchestration End-to-End

Assignee: Dev A Effort: 0.5 days Depends on: 1.1, 1.2, 1.3, 1.4, 1.5

Description: Integration test that runs the full orchestration loop: load persona → generate thesis → generate plan → execute tool calls (mocked) → collect results. This validates the wiring between all M1 components before scoring/action engines exist.

Subtasks:

Integration test: persona → thesis → plan → mocked tool execution
Verify state transitions happen in order
Verify tool calls match plan steps
Verify results are collected and passed forward

Acceptance Criteria:

Integration test runs the full loop with mocked tools
State machine transitions through INIT → THESIS → PLAN → INVESTIGATE → (stops, no scorer yet)
Tool call arguments match what the plan specified
Test passes with uv run pytest tests/integration/test_orchestration_e2e.py

Deliverables:

tests/integration/test_orchestration_e2e.py

1.7 — Debug Script: Orchestration

Assignee: Dev B Effort: 0.5 days Depends on: 1.6

Description: CLI script that runs the orchestration loop with --dry-run against mocked tools. Prints state transitions, generated thesis, investigation plan, and tool call summaries to stdout using Rich formatting.

Subtasks:

CLI with --dry-run and --persona flags
Rich-formatted output: state transitions, thesis, plan
Uses mock fixtures (no API keys needed)

Acceptance Criteria:

uv run python -m kosong_agent.debug.debug_orchestration --dry-run --persona polymarket_sports runs and prints readable output
No API keys required

Deliverables:

kosong_agent/debug/debug_orchestration.py