Construction, data sources, and reproducibility notes for each published index. Every index ties back to a working paper or peer-reviewed primitive; every aggregator is in the public repository with pinned dependencies. This page documents the entire pipeline from raw Polygon blockchain events to the JSON files consumed by the dashboard.
All indices derive from the universe of resolved Polymarket trades on the Polygon blockchain,
November 2022 to present (671M trades across 1.98M wallets as of the latest snapshot).
Of these, the subset with resolved markets enters the Prelec, calibration, and P&L aggregations
(~465K non-bot trades in the calibration pool, ~483K wallets tested
in the insider screen). Trades are deduplicated against the Paradigm events index and joined against
the Polymarket market-to-outcome mapping (market_winner_map.pkl).
Source
Mechanism
Cadence
Raw events
Alchemy eth_getLogs against Polymarket CTF + neg-risk contracts
Weekly delta (~30M new trades / week)
Market metadata
Polymarket Gamma API /markets/{id}
On-demand for top markets
Order books
Polymarket CLOB API /book?token_id={tid}
Snapshot at refresh time
Resolutions
Polymarket on-chain settlement events
As markets resolve
Wallet stats
Per-wallet aggregation of the trade panel
Re-built each refresh
The framework applies to any binary prediction market with public settlement data. Kalshi integration
is contingent on wallet-level data access and is planned for a future release.
Pipeline architecture
The pipeline runs as a sequence of Python aggregators orchestrated by pipeline/run_all.py.
Each aggregator reads from the master processed_trades.csv panel (~295 GB)
via DuckDB streaming queries and writes a typed JSON snapshot plus a long-format CSV history to
site/public/data/. The Astro static site rebuilds on each refresh and deploys via GitHub Pages.
DuckDB is configured with PRAGMA memory_limit='12GB' and PRAGMA threads=8.
All numeric output is sanitized through common.write_json, which converts NaN/Inf to
null before serialization (strict JSON; Vite, browsers, jq all reject literal
NaN tokens).
Wallet classification
Wallets are partitioned into five exclusive classes using Paper 1’s rule, applied in the order shown
below (first matching class wins):
Class
Rule
Population share
Algorithmic
trades_per_day > 50 OR n_trades > 1000
~0.6%
Sophisticated
volume > $10K AND HHI < 0.5 AND active_span > 30 days
~3%
Active retail
10 ≤ n_trades ≤ 1000
~13%
Casual
2 ≤ n_trades ≤ 9
~30%
One-shot
n_trades = 1
~53%
HHI (Herfindahl-Hirschman Index) is computed over the wallet’s share of trades by market. Lower
HHI = more diversified across markets. The classification is reproduced in code/profile.do
for Stata replication and is consistent across Paper 1 Table 2.
Indices: per-index specifications
Probability Weighting Index (PWI)
Composite z-score of two weekly moments computed from the non-bot trade subset:
Mean calibration error: |trade_price − realized_outcome| averaged across
all resolved trades that week, with realized_outcome ∈ 1.
Longshot fraction: (trades at price < 0.10) / (total trades) that week.
Each moment is z-scored against its own 52-week rolling distribution (mean / SD). The PWI is the equal-
weighted sum of the two z-scores. Reported as the raw value, plus 4-, 13-, and 52-week
moving averages, plus the headline 52-week rolling z-score. A PWI z-score of +1 is one standard deviation
of elevated probability distortion versus the prior year. Sample-size floor: ≥ 1,000 non-bot
trades that week, otherwise the value is suppressed.
Market Calibration Curve
Twenty price bins of width 0.05 from 0.025 to 0.975. For each bin we compute:
Realized win rate: n_wins / n_trades in the bin.
95% binomial CI: Wilson score interval (preferred over Wald for small p).
Calibration gap: realized − bin_center.
A one-parameter Prelec function is then fit by weighted nonlinear least squares, weights proportional
to n_trades:
w(p) = exp(-(-log p)^α)
Solver: scipy.optimize.curve_fit with bounds α ∈ (0.1, 2.0) and initial guess
α = 0.7. Reports α, the standard error from the inverse Hessian, and the fit
R². Pooled across the full panel: α ≈ 0.664, R² ≈ 0.987 on the latest
snapshot. Within ~1 SE of Tversky-Kahneman’s 1992 experimental estimate of 0.65.
Execution Edge Monitor
The Prelec fit is repeated separately for each of the five wallet classes each week. The execution-edge
headline is:
alpha_gap_t = α_bot,t − α_active_retail,t
A positive gap means algorithmic wallets’ calibration curve sits closer to the rational diagonal
than active retail’s. The 13-week MA of the gap is the publication value (weekly noise is large
because per-class fits are sensitive to bin sparsity). Behavioral proxy for the algorithmic-retail
execution divergence documented in Paper 1: algorithmic wallets capture
−4.25 bps effective spread; active retail pays +11.68 bps. The
alpha-gap tracks that separation week-by-week.
Private Information Index (PII)
Wallet-level forensic test from Paper 2. For each wallet with ≥ 10 resolved trades:
Excess accuracy: accuracy − max(p̄, 1 − p̄), where p̄ is the
wallet’s volume-weighted average entry price. The subtrahend is the accuracy a price-following
strategy would achieve mechanically.
One-sided binomial z-test with the per-trade success probability set to
max(p̄, 1 − p̄).
Flag at z > 2.326 (one-sided p < 0.01) and excess accuracy
positive. Latest snapshot: 6,292 of 483,234 wallets flagged (1.30%).
Multiple-testing corrections: Holm-Bonferroni family-wise (806 survivors) and
Benjamini-Hochberg false-discovery-rate (2,492 survivors). Both are reported alongside the raw flag count.
Sub-populations are split by the MNPI taxonomy: Vote (elections, awards, political
decisions), Action (military, regulatory, decision-driven outcomes), Performance (sports, individual
performance), Stochastic (crypto prices, weather, aggregate forces). Categories are assigned by a
rules-and-LLM pipeline validated against hand-coded markets. The flag-rate ordering Action > Vote
> Performance > Stochastic confirms the theoretical prediction that markets where humans control
outcomes carry stronger informational signatures.
Algorithmic Share of Participation
Weekly share of counterparty events attributable to wallets in the algorithmic class. Each
on-chain match has a maker (limit order) and a taker (market order) and contributes two
participations to the denominator. Counting both sides keeps the human side of the market visible:
maker-only attribution would assign roughly 95% to algorithms because algorithmic wallets dominate
passive liquidity provision.
Reported with the raw weekly value, 4/13/52-week MAs, and 52-week rolling z-score. Per-class shares
(active retail, sophisticated, casual, one-shot) are reported alongside in the same payload.
Profit Split Decomposition
Per-trade P&L for each resolved trade is decomposed into directional (forecasting)
and execution (price) components using Paper 1’s
main-analysis benchmark: the per-(market, token) volume-weighted average price across buy-side
trades only. For trade i in market m, token T, with entry price
P, fair-price benchmark v(m,T), outcome W ∈ 1, token quantity Q,
and side sign S = +1 (BUY) or −1 (SELL):
v(m, T) = Σ(price × usdc_amount) / Σ(usdc_amount), buy-side trades only
Total P&L = S × (W − P) × Q
Directional = S × (W − v) × Q ← did the trade pick the side the avg buyer picked?
Execution = S × (v − P) × Q ← did the trade get a better price than the avg buyer?
Identity: Directional + Execution = Total P&L (per trade, exact)
The aggregator runs incrementally: profit_split/build_fair_prices.py
computes the per-(market, token) buy-side VWAP across the full 671M-trade panel as a one-time backfill
(~25 min), and profit_split/update_weekly.py extends the cache as new trades arrive
(~1-3 min per Sunday). The cache lives at cache/fair_prices.parquet with columns
{market_id, token_id, pv_sum, v_sum, fair_price, last_block}.
Two deliberate dashboard choices distinguish this aggregator from Paper 1’s published table:
Both-side attribution. Paper 1 attributes each match to the maker side only
(academic standard). The dashboard attributes the match to both the maker and the taker
with mirror-signed components, so dollar P&L sums to zero across wallet types by construction
(zero-sum market, no fees in the sample period). This is the most readable framing for a public
dashboard. The cumulative cross-skill structure documented in Paper 1 is unchanged.
Volume-weighted ROI bps. Paper 1 Table 1 reports per-trade average edge × 10,000.
The dashboard reports volume-weighted ROI (P&L / USD volume) × 10,000. The two are not
directly comparable; the dashboard’s framing is consistent with the dollar P&L column.
For per-market execution gaps (briefings cards), the comparison is grouped by
(market, token, wallet_type) and aggregated across tokens where both bots and active retail
have ≥ $1K of weekly volume on the same token, then volume-weighted across qualifying tokens. This
isolates execution edge from side-selection (bots-on-YES vs retail-on-NO would otherwise produce a
false gap).
Longshot / Favorite Price Gap
Longshot band defined as price ∈ [0, 0.10). The weekly index is:
Positive gaps indicate longshots are underpriced (realized wins exceed the band midpoint); negative
gaps are the classical favorite-longshot bias. Reported with weekly value, 13-week MA, and 52-week
rolling z-score. The favorite-side analog (top price bin) requires a separate weekly extraction and
is planned for a future release.
Market Efficiency Trend
Weekly R² of the one-parameter Prelec fit applied to active retail trades. Higher values
mean the week’s calibration gap is well-described by a single behavioral parameter; lower values
indicate idiosyncratic shocks dominate. The headline uses active retail because it is the largest
human cohort by trade count and most closely represents how human beliefs price the market.
Rolling statistics convention
Each weekly index is published with 4-, 13-, and 52-week trailing moving averages and
a 52-week rolling z-score. The z-score uses the rolling mean and rolling sample
standard deviation (Bessel-corrected, ddof=1), and requires at least 13 non-null
observations in the trailing window before reporting; otherwise null.
Each aggregator enforces minimum sample sizes to prevent reporting on weeks dominated by noise:
Index
Floor
Behavior when below
PWI
1,000 non-bot trades
Suppressed (null)
Calibration (per bin)
50 trades
CI shown but not used in fit weighting
Execution edge (per class)
200 trades
Per-class α suppressed; gap suppressed if either side missing
Profit split (per type)
5,000 weekly participations
Directional/execution columns suppressed; total P&L still reported
PII (per wallet)
10 resolved trades
Wallet not tested
Top markets (week)
Week vol ≥ 30% of trailing 4-week median
Partial week dropped from history
Refresh schedule
Indices refresh every Sunday at 22:00 Pacific time. The pipeline extracts new Polygon
blocks via Alchemy, updates the processed-trade panel, re-runs all aggregations, regenerates the static
site via npm run build, and pushes to GitHub Pages. End-to-end refresh time:
~60-75 minutes (delta pull + microstructure scans dominate). Each refresh is tagged
with a semantic version; tags are archived to Zenodo with permanent DOIs.
Software stack
Component
Version
Use
Python
3.11
Aggregators, pipeline orchestrator
DuckDB
0.10+
Streaming aggregation over the master CSV
pandas
2.x
Time-series wrangling, rolling stats
scipy.optimize
1.13+
Prelec NLS via curve_fit
aiohttp
3.x
Concurrent Alchemy eth_getLogs
requests
2.x
Gamma + CLOB API calls
Astro
4.x
Static site generator
Vite
via Astro
JSON imports + bundle
Reproducibility
All aggregation code is in the public repository, with pinned dependencies in
requirements.txt. The processed trade panel is derived from public blockchain data and
is regenerable end-to-end from the Alchemy delta puller forward. The CHANGELOG records every
methodology change with date and rationale; breaking changes to any published column trigger
a major version bump and a one-release deprecation window. Numerical reproducibility:
identical input + same package versions produces byte-identical output (no random seeds used outside
Paper 2’s permutation validation, which is seeded).
Daubert mapping (Rule 702)
Factor
How this work satisfies it
Testable
Every index reports an explicit null hypothesis. Test statistics are recomputable from public
data using the published code. PWI: composite of two well-defined moments. PII: binomial z under
null of price-following. Execution edge: difference of fitted Prelec parameters.
Known error rate
PII: permutation-validated false-positive rate, plus Holm-Bonferroni and BH-FDR corrections
reported separately. Calibration + PWI: bootstrap CIs available on request.
Peer review
Each index is grounded in a working paper (see research page) and in
previously peer-reviewed primitives. Methodology log + CHANGELOG provides an auditable trail.
General acceptance
Primitives drawn from Kyle (1985), Glosten-Milgrom (1985), Fama (1972), Anand, Irvine, Puckett,
Venkataraman (2012), Tversky & Kahneman (1992), Prelec (1998). All standard in finance and
decision-theory literatures.
Citations
Anand, A., Irvine, P., Puckett, A., & Venkataraman, K. (2012). Performance of institutional trading desks. JFE, 105(3), 597-617.
Bruls, M., Huijsmans, K., & van Wijk, J. (2000). Squarified treemaps. Data Visualization 2000.
Fama, E. F. (1972). Components of investment performance. Journal of Finance, 27(3), 551-567.
Glosten, L. R., & Milgrom, P. R. (1985). Bid, ask and transaction prices in a specialist market. JFE, 14(1), 71-100.
Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica, 53(6), 1315-1335.
Prelec, D. (1998). The probability weighting function. Econometrica, 66(3), 497-527.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. JASA, 22(158), 209-212.
Joshua Della Vedova · Knauss School of Business, University of San Diego
Updated weekly · 2026-W25