Week 1 — Milestone 1: Base Agent Infrastructure
Goal: Build the orchestration engine that drives thesis → plan → investigate → score → action.
Exit Criteria: Orchestration loop runs end-to-end with mocked tool responses. State machine is tested. Thesis and plan generators produce structured output. Persona configs load and constrain behavior.
Tasks
1.1 — Orchestration Loop / State Machine
Assignee: Dev A Effort: 2 days Depends on: —
Description:
Build the core state machine that drives the agent loop. States: INIT → THESIS → PLAN → INVESTIGATE → SCORE → ACTION → DONE. Each transition calls the appropriate generator/engine. The loop handles errors (retry current state), max-iteration guards, and early termination.
Subtasks:
- Define state enum and transition rules
- Implement
Orchestratorclass withrun()method - State persistence (in-memory dict, serializable to JSON)
- Error handling: retry on transient failures, abort on fatal
- Max iteration guard (configurable, default 10)
- Structured logging at each state transition
- Unit tests: all valid transitions, error recovery, max iterations
Acceptance Criteria:
Orchestrator.run(persona, context)executes full state machine- Each state transition is logged with timestamp and payload summary
- Errors in any state retry up to N times before moving to ERROR state
- Unit tests cover: happy path, each error path, max iteration, early termination
- All tests pass with
uv run pytest tests/unit/test_orchestrator.py
Deliverables:
kosong_agent/orchestration/state_machine.pykosong_agent/orchestration/orchestrator.pytests/unit/test_orchestrator.py
1.2 — Trade Thesis Generator
Assignee: Dev A Effort: 1 day Depends on: 1.1
Description:
LLM-powered module that takes a persona config + market context and generates a structured trade thesis. Uses kosong step() to call the LLM with a specific system prompt. Output is a Pydantic model: claim, direction (LONG/SHORT/NEUTRAL), assumptions (list), risk factors (list), confidence bracket (LOW/MEDIUM/HIGH).
Subtasks:
- Define
TradeThesisPydantic model - Thesis generator prompt (system prompt for LLM)
-
ThesisGenerator.generate(persona, market_context) -> TradeThesis - Parse LLM JSON output into Pydantic model with validation
- Fallback: retry with stricter prompt if parsing fails
- Unit tests with mocked LLM responses (valid + malformed)
Acceptance Criteria:
- Takes persona config + market context dict → returns
TradeThesis - LLM prompt produces parseable JSON consistently
- Handles malformed LLM output gracefully (retry once, then raise)
- Unit tests cover: valid thesis, malformed response, missing fields
- Tests pass with
uv run pytest tests/unit/test_thesis.py
Deliverables:
kosong_agent/orchestration/thesis.pykosong_agent/orchestration/prompts/thesis_prompt.pytests/unit/test_thesis.py
1.3 — Investigation Plan Generator
Assignee: Dev A Effort: 1 day Depends on: 1.2
Description: LLM-powered module that takes a thesis and maps its assumptions to an ordered checklist of tool calls. Each step has: tool name, parameters, purpose (which assumption it validates), and priority. The plan generator knows which Nava tools are available (from persona config’s tool allowlist).
Subtasks:
- Define
InvestigationPlanandPlanStepPydantic models - Plan generator prompt (maps assumptions → tool calls)
-
PlanGenerator.generate(thesis, available_tools) -> InvestigationPlan - Validate that all referenced tools exist in the toolset
- Order steps by dependency (some tools need output from others)
- Unit tests with mocked LLM responses
Acceptance Criteria:
- Takes
TradeThesis+ tool list → returnsInvestigationPlanwith ordered steps - Each step references a valid tool name from the available toolset
- Steps include parameter templates (some params filled, some from prior step output)
- Unit tests cover: valid plan, unknown tool reference, empty thesis
- Tests pass with
uv run pytest tests/unit/test_plan.py
Deliverables:
kosong_agent/orchestration/plan.pykosong_agent/orchestration/prompts/plan_prompt.pytests/unit/test_plan.py
1.4 — Persona Config Schema
Assignee: Dev A Effort: 0.5 days Depends on: —
Description: Define the Pydantic models for persona configuration. A persona has: name, description, goals, constraints, risk limits, market filters, tool allowlist, scoring factor weights, and action thresholds. Create 3 starter configs as JSON files.
Subtasks:
- Define
PersonaConfigPydantic model with all fields - Define nested models:
RiskLimits,MarketFilters,ScoringConfig - Create
personas/polymarket_sports.json - Create
personas/polymarket_multichoice.json - Create
personas/hyperliquid_intent.json - Loader function:
load_persona(name) -> PersonaConfig - Unit tests: load each persona, validate constraints
Acceptance Criteria:
PersonaConfigmodel validates all required fields- 3 JSON config files parse without errors
load_persona("polymarket_sports")returns validated config- Each persona specifies different tool allowlists and scoring weights
- Tests pass with
uv run pytest tests/unit/test_persona.py
Deliverables:
kosong_agent/orchestration/persona.pykosong_agent/personas/polymarket_sports.jsonkosong_agent/personas/polymarket_multichoice.jsonkosong_agent/personas/hyperliquid_intent.jsontests/unit/test_persona.py
1.5 — Test Fixtures + Mock Data
Assignee: Dev B Effort: 1 day Depends on: —
Description: Create realistic mock responses for each Nava tool so all unit tests can run without API keys. Fixtures include: market discovery results, trade history, price history, orderbook snapshots, wallet history, funding rates.
Subtasks:
- Mock Polymarket market discovery response (3-5 markets)
- Mock Polymarket trade history (50+ trades, mixed buy/sell)
- Mock Polymarket price history (100+ data points with a spike)
- Mock Polymarket orderbook (bid/ask with depth)
- Mock Hyperliquid market discovery (3-5 perps)
- Mock Hyperliquid trades (50+ trades with varying sizes)
- Mock Hyperliquid candles (OHLCV for 7 days)
- Mock Hyperliquid funding rates (30 days)
- Mock LLM responses for thesis and plan generators
- Pytest fixtures in
conftest.pyfor shared access
Acceptance Criteria:
- All fixtures are realistic (based on actual API response shapes)
- Fixtures are importable from
tests/fixtures/ - Mock LLM responses are valid JSON matching expected schemas
- No test requires an API key to run in unit mode
Deliverables:
tests/fixtures/polymarket_mocks.pytests/fixtures/hyperliquid_mocks.pytests/fixtures/llm_mocks.pytests/conftest.py
1.6 — Wire Orchestration End-to-End
Assignee: Dev A Effort: 0.5 days Depends on: 1.1, 1.2, 1.3, 1.4, 1.5
Description: Integration test that runs the full orchestration loop: load persona → generate thesis → generate plan → execute tool calls (mocked) → collect results. This validates the wiring between all M1 components before scoring/action engines exist.
Subtasks:
- Integration test: persona → thesis → plan → mocked tool execution
- Verify state transitions happen in order
- Verify tool calls match plan steps
- Verify results are collected and passed forward
Acceptance Criteria:
- Integration test runs the full loop with mocked tools
- State machine transitions through INIT → THESIS → PLAN → INVESTIGATE → (stops, no scorer yet)
- Tool call arguments match what the plan specified
- Test passes with
uv run pytest tests/integration/test_orchestration_e2e.py
Deliverables:
tests/integration/test_orchestration_e2e.py
1.7 — Debug Script: Orchestration
Assignee: Dev B Effort: 0.5 days Depends on: 1.6
Description:
CLI script that runs the orchestration loop with --dry-run against mocked tools. Prints state transitions, generated thesis, investigation plan, and tool call summaries to stdout using Rich formatting.
Subtasks:
- CLI with
--dry-runand--personaflags - Rich-formatted output: state transitions, thesis, plan
- Uses mock fixtures (no API keys needed)
Acceptance Criteria:
uv run python -m kosong_agent.debug.debug_orchestration --dry-run --persona polymarket_sportsruns and prints readable output- No API keys required
Deliverables:
kosong_agent/debug/debug_orchestration.py