BAKERPICKS

How It Works

BakerPicks uses a three-stage model pipeline that feeds game-level context into player-level predictions, then simulates 10,000 outcomes via Monte Carlo to produce full probability distributions — not point estimates.

Pipeline Overview

Inputs
Vegas Lines, Team Stats, Injuries, Lineups, Referees
Stage 0
Game Model
Stage 1
Minutes
Stage 2
Rate x7
Monte Carlo
10K Sims
Output
P(Over), Edge, EV, Kelly

Stage Details

0

Game Model

Context is everything

Before predicting any player stat, we predict the game itself. Four LightGBM models forecast margin, total score, pace, and blowout probability using Vegas lines, team season stats, rest/B2B, and recent 10-game form.

Outputs

  • Win probability (calibrated from margin via residual std)
  • Projected total score
  • Projected pace (possessions per 48 min)
  • Blowout probability

Key Features

34 features including matchup differentials, home/away splits, injury impact

1

Minutes Model

How long will they play?

A quantile LightGBM regression predicts each player's minutes distribution — not a single number, but the 10th, 25th, 50th, 75th, and 90th percentiles. This captures the uncertainty between a normal game and a blowout or injury scenario.

Outputs

  • Truncated-normal distribution over [0, 48] minutes
  • Full uncertainty range from DNP to overtime

Key Features

Rolling minute means, blowout prob, teammates out, schedule density, travel distance, starter status

2

Rate Models

Per-36 production for 7 stats

Seven separate LightGBM regressors predict per-36-minute production rates for points, rebounds, assists, 3-pointers, steals, blocks, and turnovers. Position encoding is critical — centers grab twice the rebounds of guards.

Outputs

  • Per-36 rate prediction per stat
  • Out-of-fold residual std for noise calibration
  • Bias correction from pooled validation

Key Features

Position encoding, usage vacuum from teammate absences, rolling per-36 means, opponent defense

Stat-Specific Distributions

Each stat uses a noise distribution chosen for its real-world behavior. Points have fat tails (scoring streaks), while steals and blocks are zero-heavy (many games with zero).

StatDistributionRationale
PointsStudent-t (df~6)Fat tails capture scoring streaks
ReboundsNegative BinomialCount data, per-36 fitting avoids double-counting minutes variance
AssistsNegative BinomialCount data with overdispersion
3-PointersBinomial-PoissonRate-scaled so expected makes match model prediction
StealsZero-Inflated PoissonHandles high frequency of zero-steal games
BlocksZero-Inflated PoissonSame zero-heavy pattern as steals
TurnoversNegative BinomialCount data with player-specific variance

Self-Correcting Calibration

Every night at 2 AM ET, the postgame pipeline grades all predictions against actual results. Three feedback mechanisms then correct the model:

Platt Calibration

Logistic recalibration on a 21-day rolling window of graded predictions. Maps raw model probabilities to calibrated probabilities.

Adaptive Bias

Per-stat rolling mean prediction error. When systematic bias exceeds 0.3 units with 15+ samples, an automatic correction is applied to future predictions.

Drift Detection

PSI/KS tests on feature distributions flag when the data has shifted. Automatic retraining triggers when ECE exceeds 0.10 for 3+ consecutive days.