How It Works
BakerPicks uses a three-stage model pipeline that feeds game-level context into player-level predictions, then simulates 10,000 outcomes via Monte Carlo to produce full probability distributions — not point estimates.
Pipeline Overview
Stage Details
Game Model
Context is everything
Before predicting any player stat, we predict the game itself. Four LightGBM models forecast margin, total score, pace, and blowout probability using Vegas lines, team season stats, rest/B2B, and recent 10-game form.
Outputs
- Win probability (calibrated from margin via residual std)
- Projected total score
- Projected pace (possessions per 48 min)
- Blowout probability
Key Features
34 features including matchup differentials, home/away splits, injury impact
Minutes Model
How long will they play?
A quantile LightGBM regression predicts each player's minutes distribution — not a single number, but the 10th, 25th, 50th, 75th, and 90th percentiles. This captures the uncertainty between a normal game and a blowout or injury scenario.
Outputs
- Truncated-normal distribution over [0, 48] minutes
- Full uncertainty range from DNP to overtime
Key Features
Rolling minute means, blowout prob, teammates out, schedule density, travel distance, starter status
Rate Models
Per-36 production for 7 stats
Seven separate LightGBM regressors predict per-36-minute production rates for points, rebounds, assists, 3-pointers, steals, blocks, and turnovers. Position encoding is critical — centers grab twice the rebounds of guards.
Outputs
- Per-36 rate prediction per stat
- Out-of-fold residual std for noise calibration
- Bias correction from pooled validation
Key Features
Position encoding, usage vacuum from teammate absences, rolling per-36 means, opponent defense
Stat-Specific Distributions
Each stat uses a noise distribution chosen for its real-world behavior. Points have fat tails (scoring streaks), while steals and blocks are zero-heavy (many games with zero).
| Stat | Distribution | Rationale |
|---|---|---|
| Points | Student-t (df~6) | Fat tails capture scoring streaks |
| Rebounds | Negative Binomial | Count data, per-36 fitting avoids double-counting minutes variance |
| Assists | Negative Binomial | Count data with overdispersion |
| 3-Pointers | Binomial-Poisson | Rate-scaled so expected makes match model prediction |
| Steals | Zero-Inflated Poisson | Handles high frequency of zero-steal games |
| Blocks | Zero-Inflated Poisson | Same zero-heavy pattern as steals |
| Turnovers | Negative Binomial | Count data with player-specific variance |
Self-Correcting Calibration
Every night at 2 AM ET, the postgame pipeline grades all predictions against actual results. Three feedback mechanisms then correct the model:
Platt Calibration
Logistic recalibration on a 21-day rolling window of graded predictions. Maps raw model probabilities to calibrated probabilities.
Adaptive Bias
Per-stat rolling mean prediction error. When systematic bias exceeds 0.3 units with 15+ samples, an automatic correction is applied to future predictions.
Drift Detection
PSI/KS tests on feature distributions flag when the data has shifted. Automatic retraining triggers when ECE exceeds 0.10 for 3+ consecutive days.