In each of the past two years, I’ve compared different baseball projections systems by looking at aggregate errors. In 2013, I had access to these projection systems:
- CAIRO – from S B of the Replacement Level Yankees Weblog.
- Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
- MORPS – A projection model by Tim Oberschlake.
- Steamer/Razzball – Rate projections by Jared Cross, Dash Davidson, and Peter Rosenbloom, and playing time projections from Rudy Gamble of Razzball.com.
- RotoValue – my current model, based largely on Marcel, but with adjustments for pitching decision stats and assuming no pitcher skill in BABIP.
Like last year, I’m computing standard deviation, mean average error (MAE) and root mean square error (RMSE) for each source.
This table includes only those players projected by all five systems who played in 2012 also.
Source | Num | Avg wOBA | MAE | RMSE |
---|---|---|---|---|
Actual | 410 | 0.3272 | 0.0000 | 0.0000 |
Steamer/Razzball | 410 | 0.3342 | 0.0245 | 0.0323 |
Consensus | 410 | 0.3363 | 0.0251 | 0.0333 |
CAIRO | 410 | 0.3363 | 0.0255 | 0.0336 |
Marcel | 410 | 0.3401 | 0.0260 | 0.0344 |
RotoValue | 410 | 0.3329 | 0.0263 | 0.0351 |
MORPS | 410 | 0.3410 | 0.0265 | 0.0351 |
y2012 | 410 | 0.3357 | 0.0325 | 0.0445 |
The spread in errors for the projection systems is small, and all systems do much better than using 2012 numbers. Steamer/Razzball had the lowest overall errors, while MORPS and my updated RotoValue model had almost identical errors, just behind Marcel. The simple average consensus ranked second best, just ahead of CAIRO.
Below is a more detailed table, showing averages for all the players a system projects. Num is the total number of players, the first wOBA column is the cumulative wOBA of all those players. MLB is the number of projected players who actually had a plate appearance in 2013, and the second wOBA is the cumulative wOBA of those players. For that set, I again computed RMSE and MAE, and sorted by the former.
Source | Num | wOBA | MLB | wOBA | StdDev | MAE | RMSE |
---|---|---|---|---|---|---|---|
Actual | 634 | 0.3236 | 634 | 0.3236 | 0.0439 | 0.0000 | 0.0000 |
Steamer/Razzball | 504 | 0.3357 | 471 | 0.3333 | 0.0260 | 0.0250 | 0.0334 |
CAIRO | 507 | 0.3363 | 456 | 0.3356 | 0.0264 | 0.0257 | 0.0340 |
Consensus | 786 | 0.3322 | 554 | 0.3343 | 0.0244 | 0.0262 | 0.0355 |
MORPS | 539 | 0.3367 | 470 | 0.3406 | 0.0262 | 0.0268 | 0.0359 |
Marcel | 750 | 0.3306 | 529 | 0.3386 | 0.0241 | 0.0266 | 0.0361 |
RotoValue | 751 | 0.3293 | 529 | 0.3310 | 0.0287 | 0.0274 | 0.0387 |
y2012 | 611 | 0.3298 | 505 | 0.3307 | 0.0507 | 0.0359 | 0.0524 |
Steamer again had the lowest errors, but MORPS moves a little ahead of my system, and into a vitual tie with Marcel, while CAIRO now ranks a little ahead of the Consensus, perhaps because I compute a consensus from whatever sources I had available, which often was just Marcel and my own RotoValue system, two which had somewhat higher overall errors. The errors from the projections are not quite as bunched together as before, but are still close (and all much lower than the errors using 2012 data). It’s interesting that aside from the consensus, the ordering of lowest errors is almost the same as the ordering of projecting the fewest players.
Finally, in this last table I’m averaging in any player not projected by a system to use that system’s league average wOBA minus 0.020.
Source | Num | wOBA | MLB | wOBA | StdDev | MAE | RMSE | Missing |
---|---|---|---|---|---|---|---|---|
Actual | 634 | 0.3236 | 634 | 0.3236 | 0.0439 | 0.0000 | 0.0000 | 0 |
Steamer/Razzball | 504 | 0.3357 | 634 | 0.3319 | 0.0254 | 0.0261 | 0.0357 | 163 |
CAIRO | 507 | 0.3363 | 634 | 0.3337 | 0.0257 | 0.0269 | 0.0364 | 178 |
Consensus | 786 | 0.3322 | 634 | 0.3333 | 0.0243 | 0.0267 | 0.0364 | 80 |
Marcel | 750 | 0.3306 | 634 | 0.3367 | 0.0243 | 0.0273 | 0.0373 | 105 |
MORPS | 539 | 0.3367 | 634 | 0.3384 | 0.0259 | 0.0277 | 0.0378 | 164 |
RotoValue | 751 | 0.3293 | 634 | 0.3295 | 0.0282 | 0.0279 | 0.0396 | 105 |
y2012 | 611 | 0.3298 | 634 | 0.3289 | 0.0489 | 0.0361 | 0.0522 | 129 |
Each system has lower higher average errors now, although the ones that projected fewer players tended to see their errors drop a little more. The big picture stays basically the same, though: all the projections are much better than 2012, Steamer performed the best, followed by CAIRO, and they all are still pretty close to each other, with a spread of just under 0.004 in RMSE between Steamer and RotoValue, the best and worst ranked projection systems.
Next I’ll perform similar analysis on pitching projections.
Update January 31 2014 Two points:
1. In computing the errors I’ve bias-adjusted each source. So if an exogenous event changes the overall run environment (say, unusually mild weather, a changed strike zone, a somewhat different ball, or some other factor), the systems are not judged primarily on how well they guessed the new run environment. Effectively I first compute a delta of each player’s wOBA relative to the average of that projection (or Actual), and then compare those deltas. So adding any arbitrary constant to all projections has no effect whatsoever on my reported errors.
2. While running my program for pitching stats, I noticed a bug which affected the second and third tables in this chart. Some players, for whom my database did not have an MLBAM ID, were excluded from the averages. None of the players projected by all systems were affected, and the relative order of systems remained the same. But the exact numbers have changed slightly.
Update 7 February 2014: I found a bug in the program that generated the last table, so after fixing that I’ve replaced the table above and adjusted some of the commentary in light of the corrected data.