Week 2 — M1 Completion + M2: Scoring Engine & Polymarket Enhanced Research

Goal: Complete the M1 pipeline (scorecard + action engine) and build all 5 Polymarket enhanced research tools by wrapping existing code from polymarket-scratch/ and sports-odds-backtesting/.

Exit Criteria: Full M1 pipeline works end-to-end with real APIs. All 5 Polymarket enhanced modules have unit + integration tests.

Tasks

2.1 — Scorecard Engine

Assignee: Dev A Effort: 1.5 days Depends on: 1.6

Description: Takes tool investigation results and produces a multi-factor weighted score. Factors are configurable per persona (via ScoringConfig). Each factor maps to a specific tool’s output. The engine normalizes raw values to 0-100, applies weights, and produces an overall score with confidence level.

Subtasks:

Define Scorecard, ScorecardFactor Pydantic models
Define scoring factor registry: factor name → extraction function
Implement ScorecardEngine.score(tool_results, scoring_config) -> Scorecard
Normalization: raw tool output → 0-100 per factor
Weighted aggregation: factor scores × weights → overall score
Confidence level derivation: based on how many factors had data
Evidence summary: text explanation of each factor’s contribution
Unit tests: varied inputs, missing factors, edge cases (all zeros, all max)

Acceptance Criteria:

ScorecardEngine.score() returns Scorecard with overall (0-100), factor breakdown, confidence (LOW/MEDIUM/HIGH), and evidence summary
Missing tool results → factor scored as N/A, confidence reduced
Weights from persona config are respected
Unit tests cover: full data, partial data, empty data, extreme values
Tests pass with uv run pytest tests/unit/test_scorecard.py

Deliverables:

kosong_agent/orchestration/scorecard.py
tests/unit/test_scorecard.py

2.2 — Action Engine + Safety Hooks

Assignee: Dev A Effort: 1.5 days Depends on: 2.1

Description: Takes a scorecard and produces an action recommendation: BUY, SELL, or PASS. Before executing, runs before-action hooks that re-check critical assumptions. Enforces risk limits from persona config (max position size, max leverage, etc.). All actions are dry-run (no order submission). Produces audit trail as structured JSON log.

Subtasks:

Acceptance Criteria:

Returns ActionRecommendation with action (BUY/SELL/PASS), reason, order params (if applicable), and audit entries
Before-action hooks can veto the action (action becomes PASS with veto reason)
Risk limits from persona config are enforced (position size, leverage)
Dry-run mode produces order params but marks submitted=False
Unit tests cover: score above/below threshold, each hook failure, risk limit violations
Tests pass with uv run pytest tests/unit/test_action.py

Deliverables:

kosong_agent/orchestration/action.py
tests/unit/test_action.py

2.3 — M1 Full Integration Test

Assignee: Dev A Effort: 0.5 days Depends on: 2.1, 2.2

Description: End-to-end integration test: persona → thesis → plan → real Nava tools (with API keys) → scorecard → action. Marked @pytest.mark.integration. Validates the entire M1 pipeline works with live data.

Subtasks:

Integration test with polymarket_sports persona
Real API calls to Polymarket (market discovery, price history)
Scorecard from real tool results
Action recommendation from real scorecard
Verify audit trail is complete and parseable

Acceptance Criteria:

Full pipeline executes without errors using real Polymarket APIs
Thesis, plan, scorecard, and action are all populated
Audit trail captures every step
Test passes with uv run pytest tests/integration/test_m1_pipeline.py -m integration
Test is skipped gracefully when API keys are not set

Deliverables:

tests/integration/test_m1_pipeline.py

2.4 — Sportsbook Odds Benchmark Tool (M2)

Assignee: Dev A Effort: 1 day Depends on: 2.1

Description: New RobustTool that wraps the existing OddsClient from sports-odds-backtesting/. Fetches sportsbook odds for a given event, compares to Polymarket prices, and flags arbitrage opportunities. Outputs: odds from each bookmaker, implied probabilities, Polymarket price, edge percentage.

Reuses: OddsClient, PropSnapshot, market configs from ~/work/sports-odds-backtesting/src/odds_collector/

Subtasks:

Acceptance Criteria:

Takes sport + event query → returns comparison table (bookmaker, odds, implied prob, Polymarket price, edge %)
Handles missing odds gracefully (bookmaker not covering event)
Edge calculation is correct: edge = polymarket_price - implied_prob
Unit tests pass with mocked data
Integration test passes with real APIs (marked @pytest.mark.integration)
Tests pass with uv run pytest tests/unit/test_sportsbook_benchmark.py

Deliverables:

kosong_agent/tools/polymarket/sportsbook_benchmark.py
tests/unit/test_sportsbook_benchmark.py
tests/integration/test_sportsbook_benchmark_live.py

2.5 — Spike + Retracement Detection Tool (M2)

Assignee: Dev A Effort: 0.5 days Depends on: —

Description: New RobustTool that wraps and extends existing find_price_spikes() from polymarket-scratch/. Detects sudden price moves and subsequent reversals in Polymarket price history. Configurable spike threshold and retracement window.

Reuses: find_price_spikes(), PricePoint from ~/work/polymarket-scratch/insider-detection/src/polymarket_api.py

Subtasks:

Adapt find_price_spikes() logic into RobustTool
Add retracement detection: after spike, did price revert within N hours?
Define SpikeDetectionParams (token_id, interval, spike_threshold_pct, retracement_window_hours)
Output: list of spikes with timestamp, magnitude, retracement status, time to revert
Uses existing polymarket_price_history tool data or calls directly
Unit tests with fixture price data (known spikes + non-spikes)

Acceptance Criteria:

Detects spikes above threshold (default 20%)
For each spike, reports: timestamp, price before/after, magnitude %, retracement (yes/no/partial), time to revert
Handles flat price series (no spikes) gracefully
Unit tests cover: clear spike, no spike, spike with full retracement, spike without retracement
Tests pass with uv run pytest tests/unit/test_spike_detection.py

Deliverables:

kosong_agent/tools/polymarket/spike_detection.py
tests/unit/test_spike_detection.py

2.6 — Directional Flow Analysis Tool (M2)

Assignee: Dev B Effort: 1 day Depends on: —

Description: New RobustTool that analyzes buy vs sell pressure from Polymarket trade history. Groups trades by time window, calculates net flow direction, volume breakdown, and buy/sell ratio. Helps identify informed directional positioning.

Reuses: Trade aggregation patterns from ~/work/polymarket-scratch/insider-detection/scripts/analyze_all_markets.py

Subtasks:

Define DirectionalFlowParams (condition_id, time_window_hours, bucket_size_minutes)
Implement flow analysis:
- Fetch trades via existing polymarket_trade_history tool
- Bucket trades by time window
- Per bucket: buy volume, sell volume, net flow, buy/sell ratio
- Overall: cumulative flow direction, dominant side, flow acceleration
Output: time series of flow buckets + summary metrics
Unit tests with fixture trade data

Acceptance Criteria:

Takes condition_id + time params → returns flow analysis
Each bucket has: timestamp, buy_volume, sell_volume, net_flow, ratio
Summary includes: overall direction (BUY/SELL dominant), total volume, flow trend (accelerating/decelerating)
Unit tests cover: buy-heavy flow, sell-heavy flow, balanced flow, empty trades
Tests pass with uv run pytest tests/unit/test_directional_flow.py

Deliverables:

kosong_agent/tools/polymarket/directional_flow.py
tests/unit/test_directional_flow.py

2.7 — Account Age Scoring Tool (M2)

Assignee: Dev B Effort: 1 day Depends on: —

Description: New RobustTool that scores wallet age to flag fresh/suspicious accounts. Uses Polygon RPC or Etherscan API to find wallet’s first transaction. Applies age-based scoring tiers.

Reuses: get_wallet_first_tx() from ~/work/polymarket-scratch/insider-detection/src/polygon_client.py, signal thresholds from signals.py

Subtasks:

Acceptance Criteria:

Takes wallet address → returns age assessment with score and tier
Handles unknown wallets (no on-chain history) as maximum risk
Scoring tiers match spec above
Unit tests cover: each tier, unknown wallet, very old wallet
Tests pass with uv run pytest tests/unit/test_account_age.py

Deliverables:

kosong_agent/tools/polymarket/account_age.py
tests/unit/test_account_age.py

2.8 — Concentration + Cross-Market Detection Tool (M2)

Assignee: Dev A Effort: 1 day Depends on: —

Description: New RobustTool that detects coordinated activity: wallets heavily concentrated in related markets or exhibiting cross-market patterns. Uses wallet history and position data to flag suspicious concentration.

Reuses: Cross-market analysis from ~/work/polymarket-scratch/insider-detection/scripts/investigate_copy_trading.py

Subtasks:

Define ConcentrationParams (wallet_address, market_ids list)
Implement concentration analysis:
- Fetch wallet trading history across provided markets
- Calculate concentration ratio: value in target markets / total portfolio value
- Detect cross-market patterns: same wallet, same direction, similar timing
- Flag if concentration > threshold (default 80%)
Output: concentration_ratio, markets_traded, cross_market_patterns, risk_flag
Unit tests with fixture wallet data

Acceptance Criteria:

Takes wallet + market list → returns concentration analysis
Concentration ratio correctly calculated
Cross-market patterns detected (same direction within time window)
Risk flag raised when concentration > threshold
Unit tests cover: concentrated wallet, diversified wallet, single-market wallet
Tests pass with uv run pytest tests/unit/test_concentration.py

Deliverables:

kosong_agent/tools/polymarket/concentration.py
tests/unit/test_concentration.py

2.9 — Debug Script: Scorecard

Assignee: Dev B Effort: 0.5 days Depends on: 2.1

Description: CLI script that feeds sample tool results into the scorecard engine and prints the factor breakdown with Rich formatting.

Subtasks:

CLI with --persona flag
Load sample tool results from fixtures
Run scorecard engine, print formatted breakdown
Show: overall score, each factor (name, raw value, normalized, weight, contribution)

Acceptance Criteria:

uv run python -m kosong_agent.debug.debug_scorecard --persona polymarket_sports prints readable output
No API keys required

Deliverables:

kosong_agent/debug/debug_scorecard.py