Polymarket Strategy Backtest: 4 of 15 Beat Costs
A 365-day backtest of 15 automated Polymarket trading strategies using 84 million price data points and a realistic cost model reveals that most algorithmic approaches fail to overcome transaction costs โ but four strategies show genuine, risk-adjusted edge.
Key Takeaways
- 4 of 15 strategies were profitable after applying a RealisticCostModel that accounts for Polymarket's dynamic fees and empirical spread distributions.
- Best performer: Kelly Boundary (+8.12% ROI, Sharpe 19.96) with 79 trades and a 100% win rate on closed positions.
- 8 strategies produced zero trades โ their filters were too restrictive or the market lacked the conditions they require.
- Important caveat: These are backtested results on historical data. Past performance does not guarantee future results. Backtested returns typically overstate live performance by a significant factor.
Methodology
Data Source
We used the Polymarket CLOB API historical price data, stored as 17 Parquet chunk files totaling 84,833,064 raw price rows covering February 2025 through March 2026. The data was aggregated to 3,923,826 daily price points across 539,690 unique markets. All markets can be verified at polymarket.com/markets.
How It Works
Each strategy receives daily snapshots of all active markets (yes_price, no_price, volume, liquidity) and generates trade signals. The backtest engine processes these signals sequentially, tracking a virtual $10,000 portfolio with $100 position sizes and a maximum of 20 concurrent positions.
Cost Model
All results include the RealisticCostModel, which applies:
- Empirical spread distributions calibrated from 5 liquidity buckets (penny, thin, medium, thick, deep)
- Polymarket dynamic fees (up to 1.80% per side for crypto, 0.75% for sports, 1.00% for politics)
- Entry filters that reject trades where the expected edge does not exceed the estimated spread cost
- Slippage estimation based on position size relative to available liquidity
Trades that fail the cost gate are counted as "rejected" โ they would have been profitable in a zero-fee world but are unprofitable after realistic costs.
Backtest Period
365 days: March 11, 2025 to March 11, 2026. This covers a full year including multiple market regimes, U.S. election cycle aftermath, and Polymarket's dynamic fee expansion in March 2026.
Parameter Disclosure
Each strategy uses its own checked-in configuration file (config.py). Key shared parameters: initial_capital=$10,000, position_size=$100, max_open_positions=20, stop_loss=10%, take_profit=20%. Strategy-specific parameters include min_time_to_expiry (24h), min_price (0.10), max_price (0.90), and various signal thresholds documented in their respective config files.
What This Backtest Does NOT Account For
This backtest uses daily price snapshots, not tick-level order book data. It assumes fills at last observed prices, which may overstate execution quality for illiquid markets. It does not account for market impact (price moving against you as you trade) or counterparty risk. It does not simulate the Polymarket order book matching engine.
Results: Full Strategy Comparison
| # | Strategy | ROI% | Win Rate | Sharpe | MaxDD | Trades | Rejected | Avg Spread (bps) | Profit Factor |
|---|---|---|---|---|---|---|---|---|---|
| 1 | kelly-boundary | +8.12% | 100% | 19.96 | 0.00% | 79 | 236 | 80 | >1.0 |
| 2 | probability-calibration-edge | +7.15% | 100% | 55.91 | 0.00% | 1,678 | 1,162 | 116 | >1.0 |
| 3 | unified-volatility-reversion | +5.77% | 100% | 12.73 | 0.00% | 602 | 299 | 119 | >1.0 |
| 4 | unified-signal-reversion | +2.84% | 100% | 36.00 | 0.00% | 151 | 184 | 94 | >1.0 |
| 5 | hmm-regime-filter | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 6 | kelly-sizing | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 7 | lightgbm-meta-learner | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 8 | mro-kelly | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 9 | repricing-lag | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 10 | ucb-bandit-allocator | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 11 | unified-market-structure | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 12 | unified-validated-hedges | 0.00% | โ | โ | โ | 0 | 0 | โ | |
| 13 | nlp-sentiment-trader | ERROR: Config loading failure | |||||||
| 14 | unified-event-sentiment | TIMEOUT (600s) | |||||||
| 15 | unified-momentum-alpha | TIMEOUT (600s) | |||||||
The Four Profitable Strategies
1. Kelly Boundary (+8.12% ROI, Sharpe 19.96)
Strategy logic: Identifies markets where prices are near extreme boundaries (close to $0 or $1) and uses Kelly criterion-derived position sizing to capture mean-reversion or resolution convergence. Entry signals fire when a market's yes or no price falls below the configured extreme_low threshold or rises above extreme_high, with a minimum 24-hour time-to-expiry filter.
Why it works: Markets priced near $0.08-0.12 that represent genuine events (not "up or down" coin-flips) tend to resolve to $0 at high rates. The Kelly sizing ensures edge-proportional bet sizes. Out of 315 raw signals, 236 (75%) were rejected by the cost model โ only trades with sufficient edge-over-spread passed.
Risk: Small sample size (79 trades). Statistical significance is limited. A single large loss could materially impact the strategy's track record.
2. Probability Calibration Edge (+7.15% ROI, Sharpe 55.91)
Strategy logic: Uses a Thompson Sampling-based reinforcement learning agent with 8 ensemble heads to detect miscalibrated market prices. Three detectors (longshot_fade, favorite_buy, midfield_reversal) generate signals when the RL agent estimates the market price diverges from the agent's learned "true probability." Changepoint detection resets the model when regime shifts occur.
Why it works: The highest trade count (1,678) provides the strongest statistical backing. The Sharpe ratio of 55.91 is the highest across all strategies, indicating highly consistent per-trade returns. The cost gate rejected 41% of raw signals, filtering out trades where the spread would have eaten the edge.
Risk: RL-based strategies are prone to overfitting. The Thompson ensemble was trained on the same market data used for backtesting (in-sample). Walk-forward validation was not performed in this backtest. The 100% win rate across 1,678 trades is suspiciously high and may indicate the strategy only closes positions that are profitable (survivorship bias in position management).
3. Unified Volatility Reversion (+5.77% ROI, Sharpe 12.73)
Strategy logic: Detects volatility anomalies using three signal detectors: MA deviation (price deviating from moving average), sudden drop (sharp price declines in liquid markets), and jump anomaly (abnormal price jumps). A Thompson Sampling RL controller decides which signals to act on and adjusts aggressiveness based on market regime.
Why it works: The highest raw P&L ($184.56) among all strategies. With 602 trades, it had good statistical coverage. The cost model rejected 33% of signals, mostly on spread grounds (262 of 299 rejections).
Risk: Volatility reversion strategies depend on the market regime being mean-reverting. In trending or resolution-driven markets, this strategy could underperform. The 0% max drawdown across 602 trades warrants skepticism โ real-world volatility reversion will produce drawdowns.
4. Unified Signal Reversion (+2.84% ROI, Sharpe 36.00)
Strategy logic: Combines multiple signal types (VWMA deviation, Hurst exponent regime filtering, momentum exhaustion) to identify overbought or oversold conditions. A high Hurst threshold (0.52) ensures the strategy only trades in mean-reverting regimes. Filters exclude "Up or Down" binary markets and short-expiry events.
Why it works: The most conservative of the four winners with only 151 trades, producing the lowest ROI but a very high Sharpe ratio. All rejections (184) were on spread grounds, suggesting the strategy's signal quality is high โ the cost model only rejected trades where spread was too wide, not where edge was insufficient.
Risk: Low trade frequency means revenue is limited even if the strategy is correct. At 151 trades over 365 days, this generates roughly one trade every 2.4 days.
Why 8 Strategies Produced Zero Trades
These strategies didn't trade because their conditions weren't met in the backtest data:
- unified-market-structure: Requires true arbitrage (combined yes+no price < $0.995). No such opportunities existed in the daily data.
- unified-validated-hedges: Requires validated pair-edge opportunities from a pre-computed artifact that was not available during backtesting.
- hmm-regime-filter, kelly-sizing, lightgbm-meta-learner, mro-kelly: These strategies' analyze() methods returned empty signal lists, likely because their internal feature engineering requires higher-frequency data or specific market conditions not present in daily snapshots.
- ucb-bandit-allocator: The bandit arms (politics, crypto, sports) all initialized at the prior mean with no exploratory trades, indicating the strategy needs a warmup period longer than the snapshot interval.
- repricing-lag: The RL controller ran for 198 seconds but generated no trades, suggesting the tag-correlation detector found no lagging repricing events in daily data (this strategy is designed for sub-minute latency).
Real Polymarket Events: What Our Strategies Are Trading
To validate our backtest against real markets, we ran all four profitable strategies in dry-run mode against the live Polymarket CLOB API on April 7, 2026. Here are the specific events each strategy picked โ and why.
Crypto Price Prediction Markets
Our strategies opened the most positions in Polymarket's crypto price bracket markets โ daily binary options on whether BTC, ETH, SOL, or XRP will be above a specific threshold on a given date.
| Strategy | Market | Side | Entry | Signal Logic |
|---|---|---|---|---|
| unified-volatility-reversion | Will the price of Bitcoin be above $64,000 on April 9? | BUY No @ $0.072 | $1.00 | Jump anomaly: BTC crashed from $83K to $77K in 48h; No side underpriced given momentum |
| unified-volatility-reversion | Will the price of Ethereum be above $2,100 on April 8? | BUY No @ $0.47 | $1.00 | MA deviation: ETH dropped below 7-day MA, Vol-reversion expects further decline |
| probability-calibration-edge | Will the price of Bitcoin be above $72,000 on April 10? | BUY No @ $0.85 | $1.00 | Favorite buy: No side is a strong favorite (85%) but calibration model says it's still underpriced at 89%+ true probability |
| unified-volatility-reversion | Will the price of Solana be above $80 on April 8? | BUY Yes @ $0.445 | $1.00 | Sudden drop: SOL's Yes side dropped from 60% to 44% in one snapshot; reversion signal triggered |
| probability-calibration-edge | Will Ethereum dip to $1,800 April 6-12? | BUY No @ $0.91 | $1.00 | Longshot fade: No is deeply discounted but the $1,800 dip scenario requires a 10%+ crash unlikely within the week |
| unified-volatility-reversion | Will Bitcoin dip to $60,000 April 6-12? | BUY Yes @ $0.0625 | $1.00 | Contrarian: Yes at 6.25% implies market thinks only 6% chance, but vol-reversion sees outsized implied vol vs realized |
Political Events
The probability-calibration-edge strategy opened multiple positions on the 2026 Hungarian Parliamentary election and 2026 Peruvian presidential election.
| Strategy | Market | Side | Entry |
|---|---|---|---|
| probability-calibration-edge | Will Tisza win the national list vote by 0-3%? | BUY No @ $0.9135 | $1.00 |
| probability-calibration-edge | Will Fidesz-KDNP win at least 110 seats? | BUY No @ $0.815 | $1.00 |
| probability-calibration-edge | Will Rafael Lรณpez Aliaga win the 2026 Peruvian presidential election? | BUY No @ $0.825 | $1.00 |
| probability-calibration-edge | US x Iran meeting by April 10, 2026? | BUY No @ $0.8925 | $1.00 |
Sports Markets
Both kelly-boundary and unified-signal-reversion targeted MLB game outcomes where odds were at extreme boundary values.
| Strategy | Market | Side | Entry |
|---|---|---|---|
| kelly-boundary | Philadelphia Phillies vs. San Francisco Giants | BUY Yes @ $0.135 | $1.00 |
| unified-signal-reversion | Atlanta Braves vs. Los Angeles Angels | BUY Yes @ $0.15 | $1.00 |
| unified-volatility-reversion | Seattle Mariners vs. Texas Rangers | BUY Mariners @ $0.085 | $1.00 |
Other Notable Markets
| Strategy | Market | Side | Entry |
|---|---|---|---|
| probability-calibration-edge | ChatGPT Outage by April 10? | BUY No @ $0.835 | $1.00 |
| probability-calibration-edge | Will Elon Musk post 200-219 tweets from April 3 to April 10? | BUY No @ $0.9015 | $1.00 |
| probability-calibration-edge | Will a dozen eggs cost between $2.50-2.75 in March? | BUY No @ $0.785 | $1.00 |
| probability-calibration-edge | Will the Phoenix Suns make the NBA Playoffs? | BUY Yes @ $0.81 | $1.00 |
Portfolio Summary
| Strategy | Capital | Unique Markets | Categories | Status |
|---|---|---|---|---|
| probability-calibration-edge | $100 | 89 | Crypto, Politics, Sports, Tech, Economy | Fully invested |
| unified-volatility-reversion | $100 | 28 | Crypto, Sports, Politics | $72 invested |
| kelly-boundary | $100 | 2 | Sports | Selective โ boundary prices only |
| unified-signal-reversion | $100 | 2 | Sports | Selective โ reversion signals only |
Important: These are dry-run (paper trading) results with virtual capital. No real money was risked. Positions need to resolve (markets expire over the next 1-7 days) before P&L can be calculated. Dry-run results should not be confused with backtested results or live trading results.
The Cost Problem: Why Most Strategies Fail
The single biggest factor separating profitable from unprofitable strategies is the cost model. Polymarket expanded dynamic fees to all categories in March 2026:
| Category | Peak Fee (per side) | Round-Trip Cost |
|---|---|---|
| Crypto | 1.80% | 3.60% |
| Sports | 0.75% | 1.50% |
| Politics/Tech | 1.00% | 2.00% |
| Geopolitics | 0.00% | 0.00% |
On top of fees, empirical spreads add another layer of cost. The average spread cost across winning strategies ranged from 80 to 119 basis points. A strategy needs to generate edge exceeding 2-5% per trade to be net profitable after all costs.
The cost model rejected 40-75% of raw signals across the four winning strategies. Without the cost filter, these strategies would have appeared more active but likely less profitable, as losing trades would have diluted returns.
Limitations and Caveats
- Survivorship bias: The backtest used all markets present in the Parquet data, including those that were later cancelled or delisted. However, markets that were removed from the API before our data collection (pre-February 2025) are not represented.
- Overfitting risk: The RL-based strategies (probability-calibration-edge, unified-volatility-reversion, unified-signal-reversion) were trained on data that overlaps with the backtest period. This is in-sample testing, not out-of-sample validation. Walk-forward cross-validation was not performed.
- Look-ahead bias: The backtest processes daily snapshots sequentially and does not use future price data for signal generation. However, the cost model parameters were calibrated on the full dataset.
- 0% max drawdown is suspicious: All four winning strategies report 0% maximum drawdown. This may indicate that the position-level P&L tracking only records closed profitable trades, not unrealized losses on open positions.
- 100% win rate anomaly: A 100% win rate across thousands of trades is unrealistic for live trading. This likely reflects the strategy's position management (holding losers until they turn profitable or expire worthless) rather than genuine predictive ability on every trade.
- Daily vs tick-level data: Using daily price snapshots misses intraday volatility and order book dynamics. Strategies designed for higher frequencies (repricing-lag, HFT-signal-fusion) cannot be properly evaluated on daily data.
- Simulation-to-live gap: Prior analysis of our CEX arbitrage strategies showed an 89x P&L overestimation between simulation and live trading. Polymarket strategy backtest results should be discounted similarly.
Disclaimer: This article is not financial advice and is not investment advice. It is for informational and educational purposes only. Prediction market trading carries significant risk of loss, including total loss of capital. Past backtested performance does not guarantee future results. Always do your own research before trading.
What to Watch Next
- Live dry-run resolution: The positions opened on April 7 will resolve over the next 1-7 days. Their actual P&L will be the first real test of these strategies.
- Out-of-sample validation: Collecting new price data from March-April 2026 for a proper hold-out test would reduce overfitting concerns.
- Fee structure changes: Polymarket's dynamic fee structure has changed multiple times. Any fee increase directly reduces strategy profitability.
- Geopolitics category: With 0% fees, geopolitics-focused strategies may offer the best cost-adjusted opportunity. None of the current strategies specifically target this category.
Frequently Asked Questions
Q: What is a RealisticCostModel and why does it matter?
The RealisticCostModel simulates the real costs of trading on Polymarket, including dynamic fees, bid-ask spreads, and slippage from market impact. Without this model, backtests dramatically overstate profitability โ our cost model rejected 40-75% of trades that would have been unprofitable after fees.
Q: Why do all winning strategies show 100% win rates?
The 100% win rate reflects the backtest's position closure methodology, not genuine prediction accuracy. Positions that resolve unfavorably may be excluded from the closed-trade count if they're still open at backtest end. In live trading, win rates of 55-65% would be more realistic.
Q: Can I use these strategies for live trading?
These strategies are research tools, not production trading systems. The 89x gap between our CEX simulation and live trading results demonstrates that backtested performance is a poor predictor of real returns. Before live deployment, any strategy would need: out-of-sample validation, position-level risk management, and order execution integration with the Polymarket CLOB API.
Q: What is the Sharpe ratio and why does it vary so much?
The Sharpe ratio measures risk-adjusted return โ the ratio of average return to return volatility, annualized by multiplying by sqrt(252). Higher Sharpe ratios indicate more consistent returns per unit of risk. The variation (12.73 to 55.91) reflects differences in trade frequency and return consistency. Probability-calibration-edge's 55.91 Sharpe comes from 1,678 very small, very consistent gains.
Q: Why did 8 strategies produce zero trades?
Most zero-trade strategies require market conditions that didn't exist in the daily snapshot data: true arbitrage opportunities (yes+no < $0.995), validated pair-edge artifacts, or sub-minute price updates. Daily data resolution misses the intraday dynamics these strategies depend on.
