Week 2 — M1 Completion + M2: Scoring Engine & Polymarket Enhanced Research
Goal: Complete the M1 pipeline (scorecard + action engine) and build all 5 Polymarket enhanced research tools by wrapping existing code from polymarket-scratch/ and sports-odds-backtesting/.
Exit Criteria: Full M1 pipeline works end-to-end with real APIs. All 5 Polymarket enhanced modules have unit + integration tests.
Tasks
2.1 — Scorecard Engine
Assignee: Dev A Effort: 1.5 days Depends on: 1.6
Description:
Takes tool investigation results and produces a multi-factor weighted score. Factors are configurable per persona (via ScoringConfig). Each factor maps to a specific tool’s output. The engine normalizes raw values to 0-100, applies weights, and produces an overall score with confidence level.
Subtasks:
- Define
Scorecard,ScorecardFactorPydantic models - Define scoring factor registry: factor name → extraction function
- Implement
ScorecardEngine.score(tool_results, scoring_config) -> Scorecard - Normalization: raw tool output → 0-100 per factor
- Weighted aggregation: factor scores × weights → overall score
- Confidence level derivation: based on how many factors had data
- Evidence summary: text explanation of each factor’s contribution
- Unit tests: varied inputs, missing factors, edge cases (all zeros, all max)
Acceptance Criteria:
ScorecardEngine.score()returnsScorecardwith overall (0-100), factor breakdown, confidence (LOW/MEDIUM/HIGH), and evidence summary- Missing tool results → factor scored as N/A, confidence reduced
- Weights from persona config are respected
- Unit tests cover: full data, partial data, empty data, extreme values
- Tests pass with
uv run pytest tests/unit/test_scorecard.py
Deliverables:
kosong_agent/orchestration/scorecard.pytests/unit/test_scorecard.py
2.2 — Action Engine + Safety Hooks
Assignee: Dev A Effort: 1.5 days Depends on: 2.1
Description: Takes a scorecard and produces an action recommendation: BUY, SELL, or PASS. Before executing, runs before-action hooks that re-check critical assumptions. Enforces risk limits from persona config (max position size, max leverage, etc.). All actions are dry-run (no order submission). Produces audit trail as structured JSON log.
Subtasks:
- Define
ActionRecommendation,ActionAuditEntryPydantic models - Implement
ActionEngine.recommend(scorecard, persona) -> ActionRecommendation - Score thresholds: configurable per persona (e.g., BUY if score > 70)
- Before-action hooks: list of checks that must pass before action
- Hook: re-validate scorecard confidence ≥ threshold
- Hook: check position size within risk limits
- Hook: verify market still open / liquid
- Risk limit enforcement from
PersonaConfig.risk_limits - Dry-run mode: build order parameters but do not submit
- Audit trail: JSON log of recommendation + hooks + outcome
- Unit tests: above threshold, below threshold, hook failures, risk violations
Acceptance Criteria:
- Returns
ActionRecommendationwith action (BUY/SELL/PASS), reason, order params (if applicable), and audit entries - Before-action hooks can veto the action (action becomes PASS with veto reason)
- Risk limits from persona config are enforced (position size, leverage)
- Dry-run mode produces order params but marks
submitted=False - Unit tests cover: score above/below threshold, each hook failure, risk limit violations
- Tests pass with
uv run pytest tests/unit/test_action.py
Deliverables:
kosong_agent/orchestration/action.pytests/unit/test_action.py
2.3 — M1 Full Integration Test
Assignee: Dev A Effort: 0.5 days Depends on: 2.1, 2.2
Description:
End-to-end integration test: persona → thesis → plan → real Nava tools (with API keys) → scorecard → action. Marked @pytest.mark.integration. Validates the entire M1 pipeline works with live data.
Subtasks:
- Integration test with
polymarket_sportspersona - Real API calls to Polymarket (market discovery, price history)
- Scorecard from real tool results
- Action recommendation from real scorecard
- Verify audit trail is complete and parseable
Acceptance Criteria:
- Full pipeline executes without errors using real Polymarket APIs
- Thesis, plan, scorecard, and action are all populated
- Audit trail captures every step
- Test passes with
uv run pytest tests/integration/test_m1_pipeline.py -m integration - Test is skipped gracefully when API keys are not set
Deliverables:
tests/integration/test_m1_pipeline.py
2.4 — Sportsbook Odds Benchmark Tool (M2)
Assignee: Dev A Effort: 1 day Depends on: 2.1
Description:
New RobustTool that wraps the existing OddsClient from sports-odds-backtesting/. Fetches sportsbook odds for a given event, compares to Polymarket prices, and flags arbitrage opportunities. Outputs: odds from each bookmaker, implied probabilities, Polymarket price, edge percentage.
Reuses: OddsClient, PropSnapshot, market configs from ~/work/sports-odds-backtesting/src/odds_collector/
Subtasks:
- Copy/adapt
OddsClientinto nava-agent tooling (or import as dependency) - Define
SportsOddsBenchmarkParamsPydantic model (sport, event query, market type) - Implement
SportsOddsBenchmarkTool(RobustTool):- Fetch odds from Odds API for matched event
- Fetch Polymarket price for same event
- Calculate implied probability from American odds
- Compare: Polymarket price vs sportsbook implied prob
- Flag if edge > configurable threshold (default 5%)
- Unit tests with mocked odds + Polymarket responses
- Integration test hitting real Odds API + Polymarket
Acceptance Criteria:
- Takes sport + event query → returns comparison table (bookmaker, odds, implied prob, Polymarket price, edge %)
- Handles missing odds gracefully (bookmaker not covering event)
- Edge calculation is correct:
edge = polymarket_price - implied_prob - Unit tests pass with mocked data
- Integration test passes with real APIs (marked
@pytest.mark.integration) - Tests pass with
uv run pytest tests/unit/test_sportsbook_benchmark.py
Deliverables:
kosong_agent/tools/polymarket/sportsbook_benchmark.pytests/unit/test_sportsbook_benchmark.pytests/integration/test_sportsbook_benchmark_live.py
2.5 — Spike + Retracement Detection Tool (M2)
Assignee: Dev A Effort: 0.5 days Depends on: —
Description:
New RobustTool that wraps and extends existing find_price_spikes() from polymarket-scratch/. Detects sudden price moves and subsequent reversals in Polymarket price history. Configurable spike threshold and retracement window.
Reuses: find_price_spikes(), PricePoint from ~/work/polymarket-scratch/insider-detection/src/polymarket_api.py
Subtasks:
- Adapt
find_price_spikes()logic into RobustTool - Add retracement detection: after spike, did price revert within N hours?
- Define
SpikeDetectionParams(token_id, interval, spike_threshold_pct, retracement_window_hours) - Output: list of spikes with timestamp, magnitude, retracement status, time to revert
- Uses existing
polymarket_price_historytool data or calls directly - Unit tests with fixture price data (known spikes + non-spikes)
Acceptance Criteria:
- Detects spikes above threshold (default 20%)
- For each spike, reports: timestamp, price before/after, magnitude %, retracement (yes/no/partial), time to revert
- Handles flat price series (no spikes) gracefully
- Unit tests cover: clear spike, no spike, spike with full retracement, spike without retracement
- Tests pass with
uv run pytest tests/unit/test_spike_detection.py
Deliverables:
kosong_agent/tools/polymarket/spike_detection.pytests/unit/test_spike_detection.py
2.6 — Directional Flow Analysis Tool (M2)
Assignee: Dev B Effort: 1 day Depends on: —
Description: New RobustTool that analyzes buy vs sell pressure from Polymarket trade history. Groups trades by time window, calculates net flow direction, volume breakdown, and buy/sell ratio. Helps identify informed directional positioning.
Reuses: Trade aggregation patterns from ~/work/polymarket-scratch/insider-detection/scripts/analyze_all_markets.py
Subtasks:
- Define
DirectionalFlowParams(condition_id, time_window_hours, bucket_size_minutes) - Implement flow analysis:
- Fetch trades via existing
polymarket_trade_historytool - Bucket trades by time window
- Per bucket: buy volume, sell volume, net flow, buy/sell ratio
- Overall: cumulative flow direction, dominant side, flow acceleration
- Fetch trades via existing
- Output: time series of flow buckets + summary metrics
- Unit tests with fixture trade data
Acceptance Criteria:
- Takes condition_id + time params → returns flow analysis
- Each bucket has: timestamp, buy_volume, sell_volume, net_flow, ratio
- Summary includes: overall direction (BUY/SELL dominant), total volume, flow trend (accelerating/decelerating)
- Unit tests cover: buy-heavy flow, sell-heavy flow, balanced flow, empty trades
- Tests pass with
uv run pytest tests/unit/test_directional_flow.py
Deliverables:
kosong_agent/tools/polymarket/directional_flow.pytests/unit/test_directional_flow.py
2.7 — Account Age Scoring Tool (M2)
Assignee: Dev B Effort: 1 day Depends on: —
Description: New RobustTool that scores wallet age to flag fresh/suspicious accounts. Uses Polygon RPC or Etherscan API to find wallet’s first transaction. Applies age-based scoring tiers.
Reuses: get_wallet_first_tx() from ~/work/polymarket-scratch/insider-detection/src/polygon_client.py, signal thresholds from signals.py
Subtasks:
- Adapt
get_wallet_first_tx()into RobustTool - Define
AccountAgeParams(wallet_address) - Scoring tiers:
- < 7 days: HIGH RISK (score 90-100)
- 7-30 days: ELEVATED (score 60-89)
- 30-90 days: MODERATE (score 30-59)
- > 90 days: LOW RISK (score 0-29)
- Output: first_tx_date, account_age_days, risk_tier, risk_score
- Handle wallets with no history (brand new → max risk)
- Unit tests with mocked blockchain responses
Acceptance Criteria:
- Takes wallet address → returns age assessment with score and tier
- Handles unknown wallets (no on-chain history) as maximum risk
- Scoring tiers match spec above
- Unit tests cover: each tier, unknown wallet, very old wallet
- Tests pass with
uv run pytest tests/unit/test_account_age.py
Deliverables:
kosong_agent/tools/polymarket/account_age.pytests/unit/test_account_age.py
2.8 — Concentration + Cross-Market Detection Tool (M2)
Assignee: Dev A Effort: 1 day Depends on: —
Description: New RobustTool that detects coordinated activity: wallets heavily concentrated in related markets or exhibiting cross-market patterns. Uses wallet history and position data to flag suspicious concentration.
Reuses: Cross-market analysis from ~/work/polymarket-scratch/insider-detection/scripts/investigate_copy_trading.py
Subtasks:
- Define
ConcentrationParams(wallet_address, market_ids list) - Implement concentration analysis:
- Fetch wallet trading history across provided markets
- Calculate concentration ratio: value in target markets / total portfolio value
- Detect cross-market patterns: same wallet, same direction, similar timing
- Flag if concentration > threshold (default 80%)
- Output: concentration_ratio, markets_traded, cross_market_patterns, risk_flag
- Unit tests with fixture wallet data
Acceptance Criteria:
- Takes wallet + market list → returns concentration analysis
- Concentration ratio correctly calculated
- Cross-market patterns detected (same direction within time window)
- Risk flag raised when concentration > threshold
- Unit tests cover: concentrated wallet, diversified wallet, single-market wallet
- Tests pass with
uv run pytest tests/unit/test_concentration.py
Deliverables:
kosong_agent/tools/polymarket/concentration.pytests/unit/test_concentration.py
2.9 — Debug Script: Scorecard
Assignee: Dev B Effort: 0.5 days Depends on: 2.1
Description: CLI script that feeds sample tool results into the scorecard engine and prints the factor breakdown with Rich formatting.
Subtasks:
- CLI with
--personaflag - Load sample tool results from fixtures
- Run scorecard engine, print formatted breakdown
- Show: overall score, each factor (name, raw value, normalized, weight, contribution)
Acceptance Criteria:
uv run python -m kosong_agent.debug.debug_scorecard --persona polymarket_sportsprints readable output- No API keys required
Deliverables:
kosong_agent/debug/debug_scorecard.py