Week 1 — Milestone 1: Base Agent Infrastructure

Goal: Build the orchestration engine that drives thesis → plan → investigate → score → action.

Exit Criteria: Orchestration loop runs end-to-end with mocked tool responses. State machine is tested. Thesis and plan generators produce structured output. Persona configs load and constrain behavior.


Tasks

1.1 — Orchestration Loop / State Machine

Assignee: Dev A Effort: 2 days Depends on: —

Description: Build the core state machine that drives the agent loop. States: INIT → THESIS → PLAN → INVESTIGATE → SCORE → ACTION → DONE. Each transition calls the appropriate generator/engine. The loop handles errors (retry current state), max-iteration guards, and early termination.

Subtasks:

  • Define state enum and transition rules
  • Implement Orchestrator class with run() method
  • State persistence (in-memory dict, serializable to JSON)
  • Error handling: retry on transient failures, abort on fatal
  • Max iteration guard (configurable, default 10)
  • Structured logging at each state transition
  • Unit tests: all valid transitions, error recovery, max iterations

Acceptance Criteria:

  • Orchestrator.run(persona, context) executes full state machine
  • Each state transition is logged with timestamp and payload summary
  • Errors in any state retry up to N times before moving to ERROR state
  • Unit tests cover: happy path, each error path, max iteration, early termination
  • All tests pass with uv run pytest tests/unit/test_orchestrator.py

Deliverables:

  • kosong_agent/orchestration/state_machine.py
  • kosong_agent/orchestration/orchestrator.py
  • tests/unit/test_orchestrator.py

1.2 — Trade Thesis Generator

Assignee: Dev A Effort: 1 day Depends on: 1.1

Description: LLM-powered module that takes a persona config + market context and generates a structured trade thesis. Uses kosong step() to call the LLM with a specific system prompt. Output is a Pydantic model: claim, direction (LONG/SHORT/NEUTRAL), assumptions (list), risk factors (list), confidence bracket (LOW/MEDIUM/HIGH).

Subtasks:

  • Define TradeThesis Pydantic model
  • Thesis generator prompt (system prompt for LLM)
  • ThesisGenerator.generate(persona, market_context) -> TradeThesis
  • Parse LLM JSON output into Pydantic model with validation
  • Fallback: retry with stricter prompt if parsing fails
  • Unit tests with mocked LLM responses (valid + malformed)

Acceptance Criteria:

  • Takes persona config + market context dict → returns TradeThesis
  • LLM prompt produces parseable JSON consistently
  • Handles malformed LLM output gracefully (retry once, then raise)
  • Unit tests cover: valid thesis, malformed response, missing fields
  • Tests pass with uv run pytest tests/unit/test_thesis.py

Deliverables:

  • kosong_agent/orchestration/thesis.py
  • kosong_agent/orchestration/prompts/thesis_prompt.py
  • tests/unit/test_thesis.py

1.3 — Investigation Plan Generator

Assignee: Dev A Effort: 1 day Depends on: 1.2

Description: LLM-powered module that takes a thesis and maps its assumptions to an ordered checklist of tool calls. Each step has: tool name, parameters, purpose (which assumption it validates), and priority. The plan generator knows which Nava tools are available (from persona config’s tool allowlist).

Subtasks:

  • Define InvestigationPlan and PlanStep Pydantic models
  • Plan generator prompt (maps assumptions → tool calls)
  • PlanGenerator.generate(thesis, available_tools) -> InvestigationPlan
  • Validate that all referenced tools exist in the toolset
  • Order steps by dependency (some tools need output from others)
  • Unit tests with mocked LLM responses

Acceptance Criteria:

  • Takes TradeThesis + tool list → returns InvestigationPlan with ordered steps
  • Each step references a valid tool name from the available toolset
  • Steps include parameter templates (some params filled, some from prior step output)
  • Unit tests cover: valid plan, unknown tool reference, empty thesis
  • Tests pass with uv run pytest tests/unit/test_plan.py

Deliverables:

  • kosong_agent/orchestration/plan.py
  • kosong_agent/orchestration/prompts/plan_prompt.py
  • tests/unit/test_plan.py

1.4 — Persona Config Schema

Assignee: Dev A Effort: 0.5 days Depends on: —

Description: Define the Pydantic models for persona configuration. A persona has: name, description, goals, constraints, risk limits, market filters, tool allowlist, scoring factor weights, and action thresholds. Create 3 starter configs as JSON files.

Subtasks:

  • Define PersonaConfig Pydantic model with all fields
  • Define nested models: RiskLimits, MarketFilters, ScoringConfig
  • Create personas/polymarket_sports.json
  • Create personas/polymarket_multichoice.json
  • Create personas/hyperliquid_intent.json
  • Loader function: load_persona(name) -> PersonaConfig
  • Unit tests: load each persona, validate constraints

Acceptance Criteria:

  • PersonaConfig model validates all required fields
  • 3 JSON config files parse without errors
  • load_persona("polymarket_sports") returns validated config
  • Each persona specifies different tool allowlists and scoring weights
  • Tests pass with uv run pytest tests/unit/test_persona.py

Deliverables:

  • kosong_agent/orchestration/persona.py
  • kosong_agent/personas/polymarket_sports.json
  • kosong_agent/personas/polymarket_multichoice.json
  • kosong_agent/personas/hyperliquid_intent.json
  • tests/unit/test_persona.py

1.5 — Test Fixtures + Mock Data

Assignee: Dev B Effort: 1 day Depends on: —

Description: Create realistic mock responses for each Nava tool so all unit tests can run without API keys. Fixtures include: market discovery results, trade history, price history, orderbook snapshots, wallet history, funding rates.

Subtasks:

  • Mock Polymarket market discovery response (3-5 markets)
  • Mock Polymarket trade history (50+ trades, mixed buy/sell)
  • Mock Polymarket price history (100+ data points with a spike)
  • Mock Polymarket orderbook (bid/ask with depth)
  • Mock Hyperliquid market discovery (3-5 perps)
  • Mock Hyperliquid trades (50+ trades with varying sizes)
  • Mock Hyperliquid candles (OHLCV for 7 days)
  • Mock Hyperliquid funding rates (30 days)
  • Mock LLM responses for thesis and plan generators
  • Pytest fixtures in conftest.py for shared access

Acceptance Criteria:

  • All fixtures are realistic (based on actual API response shapes)
  • Fixtures are importable from tests/fixtures/
  • Mock LLM responses are valid JSON matching expected schemas
  • No test requires an API key to run in unit mode

Deliverables:

  • tests/fixtures/polymarket_mocks.py
  • tests/fixtures/hyperliquid_mocks.py
  • tests/fixtures/llm_mocks.py
  • tests/conftest.py

1.6 — Wire Orchestration End-to-End

Assignee: Dev A Effort: 0.5 days Depends on: 1.1, 1.2, 1.3, 1.4, 1.5

Description: Integration test that runs the full orchestration loop: load persona → generate thesis → generate plan → execute tool calls (mocked) → collect results. This validates the wiring between all M1 components before scoring/action engines exist.

Subtasks:

  • Integration test: persona → thesis → plan → mocked tool execution
  • Verify state transitions happen in order
  • Verify tool calls match plan steps
  • Verify results are collected and passed forward

Acceptance Criteria:

  • Integration test runs the full loop with mocked tools
  • State machine transitions through INIT → THESIS → PLAN → INVESTIGATE → (stops, no scorer yet)
  • Tool call arguments match what the plan specified
  • Test passes with uv run pytest tests/integration/test_orchestration_e2e.py

Deliverables:

  • tests/integration/test_orchestration_e2e.py

1.7 — Debug Script: Orchestration

Assignee: Dev B Effort: 0.5 days Depends on: 1.6

Description: CLI script that runs the orchestration loop with --dry-run against mocked tools. Prints state transitions, generated thesis, investigation plan, and tool call summaries to stdout using Rich formatting.

Subtasks:

  • CLI with --dry-run and --persona flags
  • Rich-formatted output: state transitions, thesis, plan
  • Uses mock fixtures (no API keys needed)

Acceptance Criteria:

  • uv run python -m kosong_agent.debug.debug_orchestration --dry-run --persona polymarket_sports runs and prints readable output
  • No API keys required

Deliverables:

  • kosong_agent/debug/debug_orchestration.py