Methodology · DV-PMI

Data sources

All indices derive from the Polymarket on-chain trade record on the Polygon blockchain, November 2022 to present (671M raw on-chain records across 1.98M wallets as of the latest snapshot, which deduplicate and resolve to 222M resolved trades). Of these, the subset with resolved markets enters the Prelec, calibration, and P&L aggregations (~465K non-bot trades in the calibration pool, ~483K wallets tested in the insider screen). Trades are deduplicated against the Paradigm events index and joined against the Polymarket market-to-outcome mapping (market_winner_map.pkl).

Source	Mechanism	Cadence
Raw events	Alchemy `eth_getLogs` against Polymarket CTF + neg-risk contracts	Weekly delta (~30M new trades / week)
Market metadata	Polymarket Gamma API `/markets/{id}`	On-demand for top markets
Order books	Polymarket CLOB API `/book?token_id={tid}`	Snapshot at refresh time
Resolutions	Polymarket on-chain settlement events	As markets resolve
Wallet stats	Per-wallet aggregation of the trade panel	Re-built each refresh

 
The framework applies to any binary prediction market with public settlement data. Kalshi integration
      is contingent on wallet-level data access and is planned for a future release.
 Pipeline architecture
 
The pipeline runs as a sequence of Python aggregators orchestrated by pipeline/run_all.py.
      Each aggregator reads from the master processed_trades.csv panel (~295 GB)
      via DuckDB streaming queries and writes a typed JSON snapshot plus a long-format CSV history to
site/public/data/. The Astro static site rebuilds on each refresh and deploys via GitHub Pages.
   
  Stage Script Wall time
 Polygon delta pull pull_polygon_delta_fast.py (aiohttp, 10 in-flight) ~36 min for ~30M new trades
 Process delta process_delta.py ~5 min
 Append to master append_delta_to_master.py ~12 sec for 12 GB delta
 Activity / volume aggregate_weekly_activity.py ~3 min
 Bot share aggregate_bot_share.py <1 min (reads weekly activity)
 PWI / calibration / efficiency / price gap aggregate_pwi.py + 3 siblings ~6 min total
 Execution edge aggregate_execution.py ~4 min
 Profit split (incremental) profit_split/update_weekly.py ~1-3 min after one-time backfill
 Top markets aggregate_top_markets.py (TOP_K = 30) ~10 min (full-CSV pass)
 Per-market microstructure aggregate_market_microstructure.py ~4 min (filtered scan)
 Live odds + depth fetch_market_snapshot.py (Gamma + CLOB) ~30 sec for 30 markets
 Briefings + narrative build_briefings.py, build_weekly_narrative.py <5 sec each
 
 
DuckDB is configured with PRAGMA memory_limit='12GB' and PRAGMA threads=8.
      All numeric output is sanitized through common.write_json, which converts NaN/Inf to
null before serialization (strict JSON; Vite, browsers, jq all reject literal
NaN tokens).
 Wallet classification
 
Wallets are partitioned into five exclusive classes using Paper 1’s rule, applied in the order shown
      below (first matching class wins):
  Class Rule Population share
 Algorithmic trades_per_day > 50 OR n_trades > 1000 ~0.6%
 Sophisticated volume > $10K AND HHI < 0.5 AND active_span > 30 days ~3%
 Active retail 10 ≤ n_trades ≤ 1000 ~13%
 Casual 2 ≤ n_trades ≤ 9 ~30%
 One-shot n_trades = 1 ~53%
 
 
HHI (Herfindahl-Hirschman Index) is computed over the wallet’s share of trades by market. Lower
      HHI = more diversified across markets. The classification is reproduced in code/profile.do
for Stata replication and is consistent across Paper 1 Table 2.
 Indices: per-index specifications
 Probability Weighting Index (PWI)
 
Each week, a one-parameter Prelec weighting function
w(p) = exp(−(−log p)^α) is fit by trade-count-weighted nonlinear least squares
      separately to each non-bot wallet class (active retail, sophisticated, casual, one-shot). The PWI is
      the trade-count-weighted mean of the fitted α values across those classes. Bots
      (trades_per_day > 50 OR n_trades > 1000) are excluded so the index
      reflects human probability weighting rather than algorithmic liquidity provision.
 
Interpretation: α = 1.0 corresponds to rational pricing with no probability distortion.
      Values below 1 are the classical inverse-S weighting (small probabilities over-weighted, mid-range
      under-weighted) documented in Tversky and Kahneman (1992). Reported as the weekly value,
      plus 4-, 13-, and 52-week moving averages, plus the headline 52-week rolling z-score. Sample-size floor:
      ≥ 1,000 non-bot trades that week; per-class fits requiring fewer than 200 trades are suppressed.
 Market Calibration Curve
 
Twenty price bins of width 0.05 from 0.025 to 0.975. For each bin we compute:
  Realized win rate: n_wins / n_trades in the bin.
 95% binomial CI: Wilson score interval (preferred over Wald for small p).
 Calibration gap: realized − bin_center.
 
 
A one-parameter Prelec function is then fit by weighted nonlinear least squares, weights proportional
      to n_trades:
 w(p) = exp(-(-log p)^α)
 
Solver: scipy.optimize.curve_fit with bounds α ∈ (0.1, 2.0) and initial guess
α = 0.7. Reports α, the standard error from the inverse Hessian, and the fit
R². Pooled across the full panel: α ≈ 0.664, R² ≈ 0.987 on the latest
      snapshot. Within ~1 SE of Tversky-Kahneman’s 1992 experimental estimate of 0.65.
 Execution Edge Monitor
 
The Prelec fit is repeated separately for each of the five wallet classes each week. The execution-edge
      headline is:
 alpha_gap_t = α_bot,t − α_active_retail,t
 
A positive gap means algorithmic wallets’ calibration curve sits closer to the rational diagonal
      than active retail’s. The 13-week MA of the gap is the publication value (weekly noise is large
      because per-class fits are sensitive to bin sparsity). Behavioral proxy for the algorithmic-retail
      execution divergence documented in Paper 1: algorithmic wallets capture
−4.25 bps effective spread; active retail pays +11.68 bps. The
      alpha-gap tracks that separation week-by-week.
 Private Information Index (PII)
 
Forensic test from Paper 2 (“Detecting Informed Trading in Prediction
      Markets: One Event at a Time,” revised June 2026). The paper shows the separation of information
      from skill cannot be made at the trader level: pooled averages dilute episodic information, and the
      best-episode statistic inflates with activity. Detection therefore runs at the
trader-event unit:
  Testable pair: a trader and a multi-market event where the trader took positions
        in at least two component markets (4,657,827 pairs across 1,020,455 traders).
 Per-event joint-accuracy test against the price-implied null: each position’s
        success probability under the null is its own transaction-time price. The one-sided p-value is the
        probability of being correct on at least as many component markets as observed.
 Multiplicity control within trader: Holm across the events each trader contests
        (family-wise), Benjamini-Hochberg for the false-discovery-rate variant.
 Dependence conditioning: platform-defined mutually exclusive (neg-risk) events are
        excluded outright; dependent bet structures (single-game multi-bets, cumulative threshold ladders)
        are partitioned out; residual within-event correlation is absorbed by a beta-binomial adjustment at
        the estimated intraclass correlation (0.18, 95% CI [0.10, 0.27]).
 Calibration: on a real-data placebo class (asset-price-direction markets, 597,146
        pairs) the corrected test yields zero discoveries at every threshold examined.
 
 
Latest snapshot: 1,008 raw flags → 112 distinct-question core → 11 trader-event pairs
      survive at the 5% level (0 at 1%), all all-correct records on disclosure events, reported by
      archetype only. The wallet-level excess-accuracy screen from the April 2026 draft (currently
10,485 of 705,883 wallets at p < 0.01; 1,558 Holm survivors; 4,700 BH-FDR) remains
      published as a sustained-skill monitor: the June revision shows that statistic measures
      persistent skill, not episodic information. Event-class labels (vote, action, performance, stochastic)
      are assigned by a rules-and-LLM pipeline validated against hand-coded markets; the stochastic class
      serves as the placebo.
 Algorithmic Share of Participation
 
Weekly share of counterparty events attributable to wallets in the algorithmic class. Each
      on-chain match has a maker (limit order) and a taker (market order) and contributes two
participations to the denominator. Counting both sides keeps the human side of the market visible:
      maker-only attribution would assign roughly 95% to algorithms because algorithmic wallets dominate
      passive liquidity provision.
 algorithmic_share_t = (bot_makers_t + bot_takers_t) / (2 × n_matches_t)
 
Reported with the raw weekly value, 4/13/52-week MAs, and 52-week rolling z-score. Per-class shares
      (active retail, sophisticated, casual, one-shot) are reported alongside in the same payload.
 Profit Split Decomposition
 
Per-trade P&L for each resolved trade is decomposed into directional (forecasting)
      and execution (price) components using Paper 1’s
main-analysis benchmark: the per-(market, token) volume-weighted average price across buy-side
      trades only. For trade i in market m, token T, with entry price
P, fair-price benchmark v(m,T), outcome W ∈ 1, token quantity Q,
      and side sign S = +1 (BUY) or −1 (SELL):
 v(m, T) = Σ(price × usdc_amount) / Σ(usdc_amount), buy-side trades only

Total P&L      = S × (W − P) × Q
Directional    = S × (W − v) × Q     ← did the trade pick the side the avg buyer picked?
Execution      = S × (v − P) × Q     ← did the trade get a better price than the avg buyer?

Identity:        Directional + Execution = Total P&L (per trade, exact)
 
The aggregator runs incrementally: profit_split/build_fair_prices.py
computes the per-(market, token) buy-side VWAP across the full 671M-record panel as a one-time backfill
      (~25 min), and profit_split/update_weekly.py extends the cache as new trades arrive
      (~1-3 min per Sunday). The cache lives at cache/fair_prices.parquet with columns
{market_id, token_id, pv_sum, v_sum, fair_price, last_block}.
 Two deliberate dashboard choices distinguish this aggregator from Paper 1’s published table:
  Both-side attribution. Paper 1 attributes each match to the maker side only
        (academic standard). The dashboard attributes the match to both the maker and the taker
        with mirror-signed components, so dollar P&L sums to zero across wallet types by construction
        (zero-sum market, no fees in the sample period). This is the most readable framing for a public
        dashboard. The cumulative cross-skill structure documented in Paper 1 is unchanged.
 Volume-weighted ROI bps. Paper 1 Table 1 reports per-trade average edge × 10,000.
        The dashboard reports volume-weighted ROI (P&L / USD volume) × 10,000. The two are not
        directly comparable; the dashboard’s framing is consistent with the dollar P&L column.
 
 
For per-market execution gaps (briefings cards), the comparison is grouped by
(market, token, wallet_type) and aggregated across tokens where both bots and active retail
      have ≥ $1K of weekly volume on the same token, then volume-weighted across qualifying tokens. This
      isolates execution edge from side-selection (bots-on-YES vs retail-on-NO would otherwise produce a
      false gap).
 Longshot / Favorite Price Gap
 
Longshot band defined as price ∈ [0, 0.10). The weekly index is:
 longshot_gap_t = realized_winrate_in_band_t − 0.05
 
Positive gaps indicate longshots are underpriced (realized wins exceed the band midpoint); negative
      gaps are the classical favorite-longshot bias. Reported with weekly value, 13-week MA, and 52-week
      rolling z-score. The favorite-side analog (top price bin) requires a separate weekly extraction and
      is planned for a future release.
 Market Efficiency Trend
 
Weekly R² of the one-parameter Prelec fit applied to active retail trades. Higher values
      mean the week’s calibration gap is well-described by a single behavioral parameter; lower values
      indicate idiosyncratic shocks dominate. The headline uses active retail because it is the largest
      human cohort by trade count and most closely represents how human beliefs price the market.
 Rolling statistics convention
 
Each weekly index is published with 4-, 13-, and 52-week trailing moving averages and
      a 52-week rolling z-score. The z-score uses the rolling mean and rolling sample
      standard deviation (Bessel-corrected, ddof=1), and requires at least 13 non-null
      observations in the trailing window before reporting; otherwise null.
 z_t = (value_t − mean_52w_t) / sd_52w_t,    valid iff n_obs_52w ≥ 13
 Sample-size floors and null handling
 
Each aggregator enforces minimum sample sizes to prevent reporting on weeks dominated by noise:
  Index Floor Behavior when below
 PWI 1,000 non-bot trades Suppressed (null)
 Calibration (per bin) 50 trades CI shown but not used in fit weighting
 Execution edge (per class) 200 trades Per-class α suppressed; gap suppressed if either side missing
 Profit split (per type) 5,000 weekly participations Directional/execution columns suppressed; total P&L still reported
 PII (per wallet) 10 resolved trades Wallet not tested
 Top markets (week) Week vol ≥ 30% of trailing 4-week median Partial week dropped from history
 
 Refresh schedule
 
Indices refresh every Sunday at 22:00 Pacific time. The pipeline extracts new Polygon
      blocks via Alchemy, updates the processed-trade panel, re-runs all aggregations, regenerates the static
      site via npm run build, and pushes to GitHub Pages. End-to-end refresh time:
~60-75 minutes (delta pull + microstructure scans dominate). Each refresh is tagged
      with a semantic version; tags are archived to Zenodo with permanent DOIs.
 Software stack
  Component Version Use
 Python 3.11 Aggregators, pipeline orchestrator
 DuckDB 0.10+ Streaming aggregation over the master CSV
 pandas 2.x Time-series wrangling, rolling stats
 scipy.optimize 1.13+ Prelec NLS via curve_fit
 aiohttp 3.x Concurrent Alchemy eth_getLogs
 requests 2.x Gamma + CLOB API calls
 Astro 4.x Static site generator
 Vite via Astro JSON imports + bundle
 
 Reproducibility
 
All aggregation code is in the public repository, with pinned dependencies in
requirements.txt. The processed trade panel is derived from public blockchain data and
      is regenerable end-to-end from the Alchemy delta puller forward. The CHANGELOG records every
      methodology change with date and rationale; breaking changes to any published column trigger
      a major version bump and a one-release deprecation window. Numerical reproducibility:
      identical input + same package versions produces byte-identical output (no random seeds used outside
      Paper 2’s permutation validation, which is seeded).
 Daubert mapping (Rule 702)
  Factor How this work satisfies it
  Testable  Every index reports an explicit null hypothesis. Test statistics are recomputable from public
        data using the published code. PWI: composite of two well-defined moments. PII: binomial z under
        null of price-following. Execution edge: difference of fitted Prelec parameters.  
  Known error rate  PII: permutation-validated false-positive rate, plus Holm-Bonferroni and BH-FDR corrections
        reported separately. Calibration + PWI: bootstrap CIs available on request.  
  Peer review  Each index is grounded in a working paper (see research page) and in
        previously peer-reviewed primitives. Methodology log + CHANGELOG provides an auditable trail.  
  General acceptance  Primitives drawn from Kyle (1985), Glosten-Milgrom (1985), Fama (1972), Anand, Irvine, Puckett,
        Venkataraman (2012), Tversky & Kahneman (1992), Prelec (1998). All standard in finance and
        decision-theory literatures.  
 
 Citations
  Anand, A., Irvine, P., Puckett, A., & Venkataraman, K. (2012). Performance of institutional trading desks. JFE, 105(3), 597-617.
 Bruls, M., Huijsmans, K., & van Wijk, J. (2000). Squarified treemaps. Data Visualization 2000.
 Fama, E. F. (1972). Components of investment performance. Journal of Finance, 27(3), 551-567.
 Glosten, L. R., & Milgrom, P. R. (1985). Bid, ask and transaction prices in a specialist market. JFE, 14(1), 71-100.
 Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica, 53(6), 1315-1335.
 Prelec, D. (1998). The probability weighting function. Econometrica, 66(3), 497-527.
 Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
 Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. JASA, 22(158), 209-212.

Stage	Script	Wall time
Polygon delta pull	`pull_polygon_delta_fast.py` (aiohttp, 10 in-flight)	~36 min for ~30M new trades
Process delta	`process_delta.py`	~5 min
Append to master	`append_delta_to_master.py`	~12 sec for 12 GB delta
Activity / volume	`aggregate_weekly_activity.py`	~3 min
Bot share	`aggregate_bot_share.py`	<1 min (reads weekly activity)
PWI / calibration / efficiency / price gap	`aggregate_pwi.py` + 3 siblings	~6 min total
Execution edge	`aggregate_execution.py`	~4 min
Profit split (incremental)	`profit_split/update_weekly.py`	~1-3 min after one-time backfill
Top markets	`aggregate_top_markets.py` (TOP_K = 30)	~10 min (full-CSV pass)
Per-market microstructure	`aggregate_market_microstructure.py`	~4 min (filtered scan)
Live odds + depth	`fetch_market_snapshot.py` (Gamma + CLOB)	~30 sec for 30 markets
Briefings + narrative	`build_briefings.py`, `build_weekly_narrative.py`	<5 sec each

Class	Rule	Population share
Algorithmic	`trades_per_day > 50` OR `n_trades > 1000`	~0.6%
Sophisticated	`volume > $10K` AND `HHI < 0.5` AND `active_span > 30 days`	~3%
Active retail	`10 ≤ n_trades ≤ 1000`	~13%
Casual	`2 ≤ n_trades ≤ 9`	~30%
One-shot	`n_trades = 1`	~53%

Index	Floor	Behavior when below
PWI	1,000 non-bot trades	Suppressed (null)
Calibration (per bin)	50 trades	CI shown but not used in fit weighting
Execution edge (per class)	200 trades	Per-class α suppressed; gap suppressed if either side missing
Profit split (per type)	5,000 weekly participations	Directional/execution columns suppressed; total P&L still reported
PII (per wallet)	10 resolved trades	Wallet not tested
Top markets (week)	Week vol ≥ 30% of trailing 4-week median	Partial week dropped from history

Component	Version	Use
Python	3.11	Aggregators, pipeline orchestrator
DuckDB	0.10+	Streaming aggregation over the master CSV
pandas	2.x	Time-series wrangling, rolling stats
scipy.optimize	1.13+	Prelec NLS via `curve_fit`
aiohttp	3.x	Concurrent Alchemy `eth_getLogs`
requests	2.x	Gamma + CLOB API calls
Astro	4.x	Static site generator
Vite	via Astro	JSON imports + bundle

Factor	How this work satisfies it
Testable	Every index reports an explicit null hypothesis. Test statistics are recomputable from public data using the published code. PWI: composite of two well-defined moments. PII: binomial z under null of price-following. Execution edge: difference of fitted Prelec parameters.
Known error rate	PII: permutation-validated false-positive rate, plus Holm-Bonferroni and BH-FDR corrections reported separately. Calibration + PWI: bootstrap CIs available on request.
Peer review	Each index is grounded in a working paper (see research page) and in previously peer-reviewed primitives. Methodology log + CHANGELOG provides an auditable trail.
General acceptance	Primitives drawn from Kyle (1985), Glosten-Milgrom (1985), Fama (1972), Anand, Irvine, Puckett, Venkataraman (2012), Tversky & Kahneman (1992), Prelec (1998). All standard in finance and decision-theory literatures.