Week 2 — M1 Completion + M2: Scoring Engine & Polymarket Enhanced Research

Goal: Complete the M1 pipeline (scorecard + action engine) and build all 5 Polymarket enhanced research tools by wrapping existing code from polymarket-scratch/ and sports-odds-backtesting/.

Exit Criteria: Full M1 pipeline works end-to-end with real APIs. All 5 Polymarket enhanced modules have unit + integration tests.


Tasks

2.1 — Scorecard Engine

Assignee: Dev A Effort: 1.5 days Depends on: 1.6

Description: Takes tool investigation results and produces a multi-factor weighted score. Factors are configurable per persona (via ScoringConfig). Each factor maps to a specific tool’s output. The engine normalizes raw values to 0-100, applies weights, and produces an overall score with confidence level.

Subtasks:

  • Define Scorecard, ScorecardFactor Pydantic models
  • Define scoring factor registry: factor name → extraction function
  • Implement ScorecardEngine.score(tool_results, scoring_config) -> Scorecard
  • Normalization: raw tool output → 0-100 per factor
  • Weighted aggregation: factor scores × weights → overall score
  • Confidence level derivation: based on how many factors had data
  • Evidence summary: text explanation of each factor’s contribution
  • Unit tests: varied inputs, missing factors, edge cases (all zeros, all max)

Acceptance Criteria:

  • ScorecardEngine.score() returns Scorecard with overall (0-100), factor breakdown, confidence (LOW/MEDIUM/HIGH), and evidence summary
  • Missing tool results → factor scored as N/A, confidence reduced
  • Weights from persona config are respected
  • Unit tests cover: full data, partial data, empty data, extreme values
  • Tests pass with uv run pytest tests/unit/test_scorecard.py

Deliverables:

  • kosong_agent/orchestration/scorecard.py
  • tests/unit/test_scorecard.py

2.2 — Action Engine + Safety Hooks

Assignee: Dev A Effort: 1.5 days Depends on: 2.1

Description: Takes a scorecard and produces an action recommendation: BUY, SELL, or PASS. Before executing, runs before-action hooks that re-check critical assumptions. Enforces risk limits from persona config (max position size, max leverage, etc.). All actions are dry-run (no order submission). Produces audit trail as structured JSON log.

Subtasks:

  • Define ActionRecommendation, ActionAuditEntry Pydantic models
  • Implement ActionEngine.recommend(scorecard, persona) -> ActionRecommendation
  • Score thresholds: configurable per persona (e.g., BUY if score > 70)
  • Before-action hooks: list of checks that must pass before action
    • Hook: re-validate scorecard confidence ≥ threshold
    • Hook: check position size within risk limits
    • Hook: verify market still open / liquid
  • Risk limit enforcement from PersonaConfig.risk_limits
  • Dry-run mode: build order parameters but do not submit
  • Audit trail: JSON log of recommendation + hooks + outcome
  • Unit tests: above threshold, below threshold, hook failures, risk violations

Acceptance Criteria:

  • Returns ActionRecommendation with action (BUY/SELL/PASS), reason, order params (if applicable), and audit entries
  • Before-action hooks can veto the action (action becomes PASS with veto reason)
  • Risk limits from persona config are enforced (position size, leverage)
  • Dry-run mode produces order params but marks submitted=False
  • Unit tests cover: score above/below threshold, each hook failure, risk limit violations
  • Tests pass with uv run pytest tests/unit/test_action.py

Deliverables:

  • kosong_agent/orchestration/action.py
  • tests/unit/test_action.py

2.3 — M1 Full Integration Test

Assignee: Dev A Effort: 0.5 days Depends on: 2.1, 2.2

Description: End-to-end integration test: persona → thesis → plan → real Nava tools (with API keys) → scorecard → action. Marked @pytest.mark.integration. Validates the entire M1 pipeline works with live data.

Subtasks:

  • Integration test with polymarket_sports persona
  • Real API calls to Polymarket (market discovery, price history)
  • Scorecard from real tool results
  • Action recommendation from real scorecard
  • Verify audit trail is complete and parseable

Acceptance Criteria:

  • Full pipeline executes without errors using real Polymarket APIs
  • Thesis, plan, scorecard, and action are all populated
  • Audit trail captures every step
  • Test passes with uv run pytest tests/integration/test_m1_pipeline.py -m integration
  • Test is skipped gracefully when API keys are not set

Deliverables:

  • tests/integration/test_m1_pipeline.py

2.4 — Sportsbook Odds Benchmark Tool (M2)

Assignee: Dev A Effort: 1 day Depends on: 2.1

Description: New RobustTool that wraps the existing OddsClient from sports-odds-backtesting/. Fetches sportsbook odds for a given event, compares to Polymarket prices, and flags arbitrage opportunities. Outputs: odds from each bookmaker, implied probabilities, Polymarket price, edge percentage.

Reuses: OddsClient, PropSnapshot, market configs from ~/work/sports-odds-backtesting/src/odds_collector/

Subtasks:

  • Copy/adapt OddsClient into nava-agent tooling (or import as dependency)
  • Define SportsOddsBenchmarkParams Pydantic model (sport, event query, market type)
  • Implement SportsOddsBenchmarkTool(RobustTool):
    • Fetch odds from Odds API for matched event
    • Fetch Polymarket price for same event
    • Calculate implied probability from American odds
    • Compare: Polymarket price vs sportsbook implied prob
    • Flag if edge > configurable threshold (default 5%)
  • Unit tests with mocked odds + Polymarket responses
  • Integration test hitting real Odds API + Polymarket

Acceptance Criteria:

  • Takes sport + event query → returns comparison table (bookmaker, odds, implied prob, Polymarket price, edge %)
  • Handles missing odds gracefully (bookmaker not covering event)
  • Edge calculation is correct: edge = polymarket_price - implied_prob
  • Unit tests pass with mocked data
  • Integration test passes with real APIs (marked @pytest.mark.integration)
  • Tests pass with uv run pytest tests/unit/test_sportsbook_benchmark.py

Deliverables:

  • kosong_agent/tools/polymarket/sportsbook_benchmark.py
  • tests/unit/test_sportsbook_benchmark.py
  • tests/integration/test_sportsbook_benchmark_live.py

2.5 — Spike + Retracement Detection Tool (M2)

Assignee: Dev A Effort: 0.5 days Depends on: —

Description: New RobustTool that wraps and extends existing find_price_spikes() from polymarket-scratch/. Detects sudden price moves and subsequent reversals in Polymarket price history. Configurable spike threshold and retracement window.

Reuses: find_price_spikes(), PricePoint from ~/work/polymarket-scratch/insider-detection/src/polymarket_api.py

Subtasks:

  • Adapt find_price_spikes() logic into RobustTool
  • Add retracement detection: after spike, did price revert within N hours?
  • Define SpikeDetectionParams (token_id, interval, spike_threshold_pct, retracement_window_hours)
  • Output: list of spikes with timestamp, magnitude, retracement status, time to revert
  • Uses existing polymarket_price_history tool data or calls directly
  • Unit tests with fixture price data (known spikes + non-spikes)

Acceptance Criteria:

  • Detects spikes above threshold (default 20%)
  • For each spike, reports: timestamp, price before/after, magnitude %, retracement (yes/no/partial), time to revert
  • Handles flat price series (no spikes) gracefully
  • Unit tests cover: clear spike, no spike, spike with full retracement, spike without retracement
  • Tests pass with uv run pytest tests/unit/test_spike_detection.py

Deliverables:

  • kosong_agent/tools/polymarket/spike_detection.py
  • tests/unit/test_spike_detection.py

2.6 — Directional Flow Analysis Tool (M2)

Assignee: Dev B Effort: 1 day Depends on: —

Description: New RobustTool that analyzes buy vs sell pressure from Polymarket trade history. Groups trades by time window, calculates net flow direction, volume breakdown, and buy/sell ratio. Helps identify informed directional positioning.

Reuses: Trade aggregation patterns from ~/work/polymarket-scratch/insider-detection/scripts/analyze_all_markets.py

Subtasks:

  • Define DirectionalFlowParams (condition_id, time_window_hours, bucket_size_minutes)
  • Implement flow analysis:
    • Fetch trades via existing polymarket_trade_history tool
    • Bucket trades by time window
    • Per bucket: buy volume, sell volume, net flow, buy/sell ratio
    • Overall: cumulative flow direction, dominant side, flow acceleration
  • Output: time series of flow buckets + summary metrics
  • Unit tests with fixture trade data

Acceptance Criteria:

  • Takes condition_id + time params → returns flow analysis
  • Each bucket has: timestamp, buy_volume, sell_volume, net_flow, ratio
  • Summary includes: overall direction (BUY/SELL dominant), total volume, flow trend (accelerating/decelerating)
  • Unit tests cover: buy-heavy flow, sell-heavy flow, balanced flow, empty trades
  • Tests pass with uv run pytest tests/unit/test_directional_flow.py

Deliverables:

  • kosong_agent/tools/polymarket/directional_flow.py
  • tests/unit/test_directional_flow.py

2.7 — Account Age Scoring Tool (M2)

Assignee: Dev B Effort: 1 day Depends on: —

Description: New RobustTool that scores wallet age to flag fresh/suspicious accounts. Uses Polygon RPC or Etherscan API to find wallet’s first transaction. Applies age-based scoring tiers.

Reuses: get_wallet_first_tx() from ~/work/polymarket-scratch/insider-detection/src/polygon_client.py, signal thresholds from signals.py

Subtasks:

  • Adapt get_wallet_first_tx() into RobustTool
  • Define AccountAgeParams (wallet_address)
  • Scoring tiers:
    • < 7 days: HIGH RISK (score 90-100)
    • 7-30 days: ELEVATED (score 60-89)
    • 30-90 days: MODERATE (score 30-59)
    • > 90 days: LOW RISK (score 0-29)
  • Output: first_tx_date, account_age_days, risk_tier, risk_score
  • Handle wallets with no history (brand new → max risk)
  • Unit tests with mocked blockchain responses

Acceptance Criteria:

  • Takes wallet address → returns age assessment with score and tier
  • Handles unknown wallets (no on-chain history) as maximum risk
  • Scoring tiers match spec above
  • Unit tests cover: each tier, unknown wallet, very old wallet
  • Tests pass with uv run pytest tests/unit/test_account_age.py

Deliverables:

  • kosong_agent/tools/polymarket/account_age.py
  • tests/unit/test_account_age.py

2.8 — Concentration + Cross-Market Detection Tool (M2)

Assignee: Dev A Effort: 1 day Depends on: —

Description: New RobustTool that detects coordinated activity: wallets heavily concentrated in related markets or exhibiting cross-market patterns. Uses wallet history and position data to flag suspicious concentration.

Reuses: Cross-market analysis from ~/work/polymarket-scratch/insider-detection/scripts/investigate_copy_trading.py

Subtasks:

  • Define ConcentrationParams (wallet_address, market_ids list)
  • Implement concentration analysis:
    • Fetch wallet trading history across provided markets
    • Calculate concentration ratio: value in target markets / total portfolio value
    • Detect cross-market patterns: same wallet, same direction, similar timing
    • Flag if concentration > threshold (default 80%)
  • Output: concentration_ratio, markets_traded, cross_market_patterns, risk_flag
  • Unit tests with fixture wallet data

Acceptance Criteria:

  • Takes wallet + market list → returns concentration analysis
  • Concentration ratio correctly calculated
  • Cross-market patterns detected (same direction within time window)
  • Risk flag raised when concentration > threshold
  • Unit tests cover: concentrated wallet, diversified wallet, single-market wallet
  • Tests pass with uv run pytest tests/unit/test_concentration.py

Deliverables:

  • kosong_agent/tools/polymarket/concentration.py
  • tests/unit/test_concentration.py

2.9 — Debug Script: Scorecard

Assignee: Dev B Effort: 0.5 days Depends on: 2.1

Description: CLI script that feeds sample tool results into the scorecard engine and prints the factor breakdown with Rich formatting.

Subtasks:

  • CLI with --persona flag
  • Load sample tool results from fixtures
  • Run scorecard engine, print formatted breakdown
  • Show: overall score, each factor (name, raw value, normalized, weight, contribution)

Acceptance Criteria:

  • uv run python -m kosong_agent.debug.debug_scorecard --persona polymarket_sports prints readable output
  • No API keys required

Deliverables:

  • kosong_agent/debug/debug_scorecard.py