How It Works

BakerPicks uses a three-stage model pipeline that feeds game-level context into player-level predictions, then simulates 10,000 outcomes via Monte Carlo to produce full probability distributions — not point estimates.

Pipeline Overview

Inputs

Vegas Lines, Team Stats, Injuries, Lineups, Referees

Stage 0

Game Model

Stage 1

Minutes

Stage 2

Rate x7

Monte Carlo

10K Sims

Output

P(Over), Edge, EV, Kelly

Stage Details

Game Model

Context is everything

Before predicting any player stat, we predict the game itself. Four LightGBM models forecast margin, total score, pace, and blowout probability using Vegas lines, team season stats, rest/B2B, and recent 10-game form.

Outputs

Win probability (calibrated from margin via residual std)
Projected total score
Projected pace (possessions per 48 min)
Blowout probability

Key Features

34 features including matchup differentials, home/away splits, injury impact

Minutes Model

How long will they play?

A quantile LightGBM regression predicts each player's minutes distribution — not a single number, but the 10th, 25th, 50th, 75th, and 90th percentiles. This captures the uncertainty between a normal game and a blowout or injury scenario.

Outputs

Truncated-normal distribution over [0, 48] minutes
Full uncertainty range from DNP to overtime

Key Features

Rolling minute means, blowout prob, teammates out, schedule density, travel distance, starter status

Rate Models

Per-36 production for 7 stats

Seven separate LightGBM regressors predict per-36-minute production rates for points, rebounds, assists, 3-pointers, steals, blocks, and turnovers. Position encoding is critical — centers grab twice the rebounds of guards.

Outputs

Per-36 rate prediction per stat
Out-of-fold residual std for noise calibration
Bias correction from pooled validation

Key Features

Position encoding, usage vacuum from teammate absences, rolling per-36 means, opponent defense

Stat-Specific Distributions

Each stat uses a noise distribution chosen for its real-world behavior. Points have fat tails (scoring streaks), while steals and blocks are zero-heavy (many games with zero).

Stat	Distribution	Rationale
Points	Student-t (df~6)	Fat tails capture scoring streaks
Rebounds	Negative Binomial	Count data, per-36 fitting avoids double-counting minutes variance
Assists	Negative Binomial	Count data with overdispersion
3-Pointers	Binomial-Poisson	Rate-scaled so expected makes match model prediction
Steals	Zero-Inflated Poisson	Handles high frequency of zero-steal games
Blocks	Zero-Inflated Poisson	Same zero-heavy pattern as steals
Turnovers	Negative Binomial	Count data with player-specific variance

Self-Correcting Calibration

Every night at 2 AM ET, the postgame pipeline grades all predictions against actual results. Three feedback mechanisms then correct the model:

Platt Calibration

Logistic recalibration on a 21-day rolling window of graded predictions. Maps raw model probabilities to calibrated probabilities.

Adaptive Bias

Per-stat rolling mean prediction error. When systematic bias exceeds 0.3 units with 15+ samples, an automatic correction is applied to future predictions.

Drift Detection

PSI/KS tests on feature distributions flag when the data has shifted. Automatic retraining triggers when ECE exceeds 0.10 for 3+ consecutive days.