Joshua Della Vedova

DV-PMI · Methodology

Methodology

Construction, data sources, and reproducibility notes for each published index. Every index ties back to a working paper or peer-reviewed primitive; every aggregator is in the public repository with pinned dependencies. This page documents the entire pipeline from raw Polygon blockchain events to the JSON files consumed by the dashboard.

Data sources

All indices derive from the universe of resolved Polymarket trades on the Polygon blockchain, November 2022 to present (671M trades across 1.98M wallets as of the latest snapshot). Of these, the subset with resolved markets enters the Prelec, calibration, and P&L aggregations (~465K non-bot trades in the calibration pool, ~483K wallets tested in the insider screen). Trades are deduplicated against the Paradigm events index and joined against the Polymarket market-to-outcome mapping (market_winner_map.pkl).

SourceMechanismCadence
Raw eventsAlchemy eth_getLogs against Polymarket CTF + neg-risk contractsWeekly delta (~30M new trades / week)
Market metadataPolymarket Gamma API /markets/{id}On-demand for top markets
Order booksPolymarket CLOB API /book?token_id={tid}Snapshot at refresh time
ResolutionsPolymarket on-chain settlement eventsAs markets resolve
Wallet statsPer-wallet aggregation of the trade panelRe-built each refresh

The framework applies to any binary prediction market with public settlement data. Kalshi integration is contingent on wallet-level data access and is planned for a future release.

Pipeline architecture

The pipeline runs as a sequence of Python aggregators orchestrated by pipeline/run_all.py. Each aggregator reads from the master processed_trades.csv panel (~295 GB) via DuckDB streaming queries and writes a typed JSON snapshot plus a long-format CSV history to site/public/data/. The Astro static site rebuilds on each refresh and deploys via GitHub Pages.

Polygon blockchain (Polymarket CTF) Alchemy eth_getLogs (10x async) ~36 min / week processed_trades.csv ~295 GB master panel 671M trades since Nov 2022 DuckDB aggregators PWI · calibration · execution · PII profit_split · top_markets · briefings JSON + CSV outputs site/public/data/*_latest.json site/public/data/*_history.csv Astro static build npm run build ~1 sec, 16 pages GitHub Pages jdellavedova.com CDN deploy ~2 min Total wall time, Sunday refresh: ~60-75 min end-to-end. Each tag archived to Zenodo with permanent DOI.
StageScriptWall time
Polygon delta pullpull_polygon_delta_fast.py (aiohttp, 10 in-flight)~36 min for ~30M new trades
Process deltaprocess_delta.py~5 min
Append to masterappend_delta_to_master.py~12 sec for 12 GB delta
Activity / volumeaggregate_weekly_activity.py~3 min
Bot shareaggregate_bot_share.py<1 min (reads weekly activity)
PWI / calibration / efficiency / price gapaggregate_pwi.py + 3 siblings~6 min total
Execution edgeaggregate_execution.py~4 min
Profit split (incremental)profit_split/update_weekly.py~1-3 min after one-time backfill
Top marketsaggregate_top_markets.py (TOP_K = 30)~10 min (full-CSV pass)
Per-market microstructureaggregate_market_microstructure.py~4 min (filtered scan)
Live odds + depthfetch_market_snapshot.py (Gamma + CLOB)~30 sec for 30 markets
Briefings + narrativebuild_briefings.py, build_weekly_narrative.py<5 sec each

DuckDB is configured with PRAGMA memory_limit='12GB' and PRAGMA threads=8. All numeric output is sanitized through common.write_json, which converts NaN/Inf to null before serialization (strict JSON; Vite, browsers, jq all reject literal NaN tokens).

Wallet classification

Wallets are partitioned into five exclusive classes using Paper 1’s rule, applied in the order shown below (first matching class wins):

ClassRulePopulation share
Algorithmictrades_per_day > 50 OR n_trades > 1000~0.6%
Sophisticatedvolume > $10K AND HHI < 0.5 AND active_span > 30 days~3%
Active retail10 ≤ n_trades ≤ 1000~13%
Casual2 ≤ n_trades ≤ 9~30%
One-shotn_trades = 1~53%

HHI (Herfindahl-Hirschman Index) is computed over the wallet’s share of trades by market. Lower HHI = more diversified across markets. The classification is reproduced in code/profile.do for Stata replication and is consistent across Paper 1 Table 2.

Indices: per-index specifications

Probability Weighting Index (PWI)

Composite z-score of two weekly moments computed from the non-bot trade subset:

  • Mean calibration error: |trade_price − realized_outcome| averaged across all resolved trades that week, with realized_outcome ∈ 1.
  • Longshot fraction: (trades at price < 0.10) / (total trades) that week.

Each moment is z-scored against its own 52-week rolling distribution (mean / SD). The PWI is the equal- weighted sum of the two z-scores. Reported as the raw value, plus 4-, 13-, and 52-week moving averages, plus the headline 52-week rolling z-score. A PWI z-score of +1 is one standard deviation of elevated probability distortion versus the prior year. Sample-size floor: ≥ 1,000 non-bot trades that week, otherwise the value is suppressed.

Market Calibration Curve

Twenty price bins of width 0.05 from 0.025 to 0.975. For each bin we compute:

  • Realized win rate: n_wins / n_trades in the bin.
  • 95% binomial CI: Wilson score interval (preferred over Wald for small p).
  • Calibration gap: realized − bin_center.

A one-parameter Prelec function is then fit by weighted nonlinear least squares, weights proportional to n_trades:

w(p) = exp(-(-log p)^α)

Solver: scipy.optimize.curve_fit with bounds α ∈ (0.1, 2.0) and initial guess α = 0.7. Reports α, the standard error from the inverse Hessian, and the fit . Pooled across the full panel: α ≈ 0.664, R² ≈ 0.987 on the latest snapshot. Within ~1 SE of Tversky-Kahneman’s 1992 experimental estimate of 0.65.

Execution Edge Monitor

The Prelec fit is repeated separately for each of the five wallet classes each week. The execution-edge headline is:

alpha_gap_t = α_bot,t − α_active_retail,t

A positive gap means algorithmic wallets’ calibration curve sits closer to the rational diagonal than active retail’s. The 13-week MA of the gap is the publication value (weekly noise is large because per-class fits are sensitive to bin sparsity). Behavioral proxy for the algorithmic-retail execution divergence documented in Paper 1: algorithmic wallets capture −4.25 bps effective spread; active retail pays +11.68 bps. The alpha-gap tracks that separation week-by-week.

Private Information Index (PII)

Wallet-level forensic test from Paper 2. For each wallet with ≥ 10 resolved trades:

  1. Excess accuracy: accuracy − max(p̄, 1 − p̄), where is the wallet’s volume-weighted average entry price. The subtrahend is the accuracy a price-following strategy would achieve mechanically.
  2. One-sided binomial z-test with the per-trade success probability set to max(p̄, 1 − p̄).
  3. Flag at z > 2.326 (one-sided p < 0.01) and excess accuracy positive. Latest snapshot: 6,292 of 483,234 wallets flagged (1.30%).
  4. Multiple-testing corrections: Holm-Bonferroni family-wise (806 survivors) and Benjamini-Hochberg false-discovery-rate (2,492 survivors). Both are reported alongside the raw flag count.

Sub-populations are split by the MNPI taxonomy: Vote (elections, awards, political decisions), Action (military, regulatory, decision-driven outcomes), Performance (sports, individual performance), Stochastic (crypto prices, weather, aggregate forces). Categories are assigned by a rules-and-LLM pipeline validated against hand-coded markets. The flag-rate ordering Action > Vote > Performance > Stochastic confirms the theoretical prediction that markets where humans control outcomes carry stronger informational signatures.

Algorithmic Share of Participation

Weekly share of counterparty events attributable to wallets in the algorithmic class. Each on-chain match has a maker (limit order) and a taker (market order) and contributes two participations to the denominator. Counting both sides keeps the human side of the market visible: maker-only attribution would assign roughly 95% to algorithms because algorithmic wallets dominate passive liquidity provision.

algorithmic_share_t = (bot_makers_t + bot_takers_t) / (2 × n_matches_t)

Reported with the raw weekly value, 4/13/52-week MAs, and 52-week rolling z-score. Per-class shares (active retail, sophisticated, casual, one-shot) are reported alongside in the same payload.

Profit Split Decomposition

Per-trade P&L for each resolved trade is decomposed into directional (forecasting) and execution (price) components using Paper 1’s main-analysis benchmark: the per-(market, token) volume-weighted average price across buy-side trades only. For trade i in market m, token T, with entry price P, fair-price benchmark v(m,T), outcome W ∈ 1, token quantity Q, and side sign S = +1 (BUY) or −1 (SELL):

v(m, T) = Σ(price × usdc_amount) / Σ(usdc_amount), buy-side trades only

Total P&L      = S × (W − P) × Q
Directional    = S × (W − v) × Q     ← did the trade pick the side the avg buyer picked?
Execution      = S × (v − P) × Q     ← did the trade get a better price than the avg buyer?

Identity:        Directional + Execution = Total P&L (per trade, exact)

The aggregator runs incrementally: profit_split/build_fair_prices.py computes the per-(market, token) buy-side VWAP across the full 671M-trade panel as a one-time backfill (~25 min), and profit_split/update_weekly.py extends the cache as new trades arrive (~1-3 min per Sunday). The cache lives at cache/fair_prices.parquet with columns {market_id, token_id, pv_sum, v_sum, fair_price, last_block}.

Two deliberate dashboard choices distinguish this aggregator from Paper 1’s published table:

  1. Both-side attribution. Paper 1 attributes each match to the maker side only (academic standard). The dashboard attributes the match to both the maker and the taker with mirror-signed components, so dollar P&L sums to zero across wallet types by construction (zero-sum market, no fees in the sample period). This is the most readable framing for a public dashboard. The cumulative cross-skill structure documented in Paper 1 is unchanged.
  2. Volume-weighted ROI bps. Paper 1 Table 1 reports per-trade average edge × 10,000. The dashboard reports volume-weighted ROI (P&L / USD volume) × 10,000. The two are not directly comparable; the dashboard’s framing is consistent with the dollar P&L column.

For per-market execution gaps (briefings cards), the comparison is grouped by (market, token, wallet_type) and aggregated across tokens where both bots and active retail have ≥ $1K of weekly volume on the same token, then volume-weighted across qualifying tokens. This isolates execution edge from side-selection (bots-on-YES vs retail-on-NO would otherwise produce a false gap).

Longshot / Favorite Price Gap

Longshot band defined as price ∈ [0, 0.10). The weekly index is:

longshot_gap_t = realized_winrate_in_band_t − 0.05

Positive gaps indicate longshots are underpriced (realized wins exceed the band midpoint); negative gaps are the classical favorite-longshot bias. Reported with weekly value, 13-week MA, and 52-week rolling z-score. The favorite-side analog (top price bin) requires a separate weekly extraction and is planned for a future release.

Market Efficiency Trend

Weekly of the one-parameter Prelec fit applied to active retail trades. Higher values mean the week’s calibration gap is well-described by a single behavioral parameter; lower values indicate idiosyncratic shocks dominate. The headline uses active retail because it is the largest human cohort by trade count and most closely represents how human beliefs price the market.

Rolling statistics convention

Each weekly index is published with 4-, 13-, and 52-week trailing moving averages and a 52-week rolling z-score. The z-score uses the rolling mean and rolling sample standard deviation (Bessel-corrected, ddof=1), and requires at least 13 non-null observations in the trailing window before reporting; otherwise null.

z_t = (value_t − mean_52w_t) / sd_52w_t,    valid iff n_obs_52w ≥ 13

Sample-size floors and null handling

Each aggregator enforces minimum sample sizes to prevent reporting on weeks dominated by noise:

IndexFloorBehavior when below
PWI1,000 non-bot tradesSuppressed (null)
Calibration (per bin)50 tradesCI shown but not used in fit weighting
Execution edge (per class)200 tradesPer-class α suppressed; gap suppressed if either side missing
Profit split (per type)5,000 weekly participationsDirectional/execution columns suppressed; total P&L still reported
PII (per wallet)10 resolved tradesWallet not tested
Top markets (week)Week vol ≥ 30% of trailing 4-week medianPartial week dropped from history

Refresh schedule

Indices refresh every Sunday at 22:00 Pacific time. The pipeline extracts new Polygon blocks via Alchemy, updates the processed-trade panel, re-runs all aggregations, regenerates the static site via npm run build, and pushes to GitHub Pages. End-to-end refresh time: ~60-75 minutes (delta pull + microstructure scans dominate). Each refresh is tagged with a semantic version; tags are archived to Zenodo with permanent DOIs.

Software stack

ComponentVersionUse
Python3.11Aggregators, pipeline orchestrator
DuckDB0.10+Streaming aggregation over the master CSV
pandas2.xTime-series wrangling, rolling stats
scipy.optimize1.13+Prelec NLS via curve_fit
aiohttp3.xConcurrent Alchemy eth_getLogs
requests2.xGamma + CLOB API calls
Astro4.xStatic site generator
Vitevia AstroJSON imports + bundle

Reproducibility

All aggregation code is in the public repository, with pinned dependencies in requirements.txt. The processed trade panel is derived from public blockchain data and is regenerable end-to-end from the Alchemy delta puller forward. The CHANGELOG records every methodology change with date and rationale; breaking changes to any published column trigger a major version bump and a one-release deprecation window. Numerical reproducibility: identical input + same package versions produces byte-identical output (no random seeds used outside Paper 2’s permutation validation, which is seeded).

Daubert mapping (Rule 702)

FactorHow this work satisfies it
Testable Every index reports an explicit null hypothesis. Test statistics are recomputable from public data using the published code. PWI: composite of two well-defined moments. PII: binomial z under null of price-following. Execution edge: difference of fitted Prelec parameters.
Known error rate PII: permutation-validated false-positive rate, plus Holm-Bonferroni and BH-FDR corrections reported separately. Calibration + PWI: bootstrap CIs available on request.
Peer review Each index is grounded in a working paper (see research page) and in previously peer-reviewed primitives. Methodology log + CHANGELOG provides an auditable trail.
General acceptance Primitives drawn from Kyle (1985), Glosten-Milgrom (1985), Fama (1972), Anand, Irvine, Puckett, Venkataraman (2012), Tversky & Kahneman (1992), Prelec (1998). All standard in finance and decision-theory literatures.

Citations

  • Anand, A., Irvine, P., Puckett, A., & Venkataraman, K. (2012). Performance of institutional trading desks. JFE, 105(3), 597-617.
  • Bruls, M., Huijsmans, K., & van Wijk, J. (2000). Squarified treemaps. Data Visualization 2000.
  • Fama, E. F. (1972). Components of investment performance. Journal of Finance, 27(3), 551-567.
  • Glosten, L. R., & Milgrom, P. R. (1985). Bid, ask and transaction prices in a specialist market. JFE, 14(1), 71-100.
  • Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica, 53(6), 1315-1335.
  • Prelec, D. (1998). The probability weighting function. Econometrica, 66(3), 497-527.
  • Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297-323.
  • Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. JASA, 22(158), 209-212.
Joshua Della Vedova · Knauss School of Business, University of San Diego Updated weekly · 2026-W25
Cite this dataset Della Vedova, J. (2026). Della Vedova Prediction Market Indices (DV-PMI). https://jdellavedova.com