Monster 2013 Projection Review

Update 7 February 2014: I’ve updated the post below to add an additional source, Rosenheck, and to correct a bug in my code generating the table for missing players. The original table incorrectly was computing RMSE and MAE of only the players projected by a system without including missing players at all, leaving a very much apples-to-oranges comparison: systems projecting fewer players tended to project only better players, who are more predictable, and hence those showed much lower RMSE. Now that I actually am including missing points, all systems do have higher errors than in the earlier table, and so the biggest apparent bias was from my own bug. Also, I’ve removed blank projections from BaseballGuru, which caused that to show higher errors in the missing player case, since blank “projections” are for a 0.00 ERA or a .000 batting average.
Tom Tango kindly highlighted my previous posts reviewing the 2013 forecast data that was available on the RotoValue site last year, and in the comments, he pointed me to a data set put online by Will Larson that has 2013 projection data from a dozen sources.
Dr. Larson kindly allows people to write articles using his data so long as they let him know and cite him as the source. I’ve downloaded projections from 12 sources from his site, http://www.bbprojectionproject.com and am including these additional sources in the comparisons for this article:

  • AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
  • Steamer – A system from Jared Cross, Dash Davidson, and Peter Rosenbloom.[ref]I’ve separately received Steamer/Razzball data from Jared Cross, using the same rate projections, but playing time projections from Razzball’s Rudy Gamble. I’ve left both sources in because they’re snapshots from different times, and they include different sets of players. But in theory, a test of just rate stats should find these the same.[/ref]
  • Will Larson – Will Larson has projections in his own name also.
  • Zips – A projection model from Dan Szymborski.
  • CBS Sportsline
  • ESPN
  • Razzball – Projections from Grey Albright of Razzball.com, distinct from the Steamer/Razzball collaboration I have evaluated previously.
  • RotoChamps
  • Fangraphs Fans
  • BaseballGURU
  • Sports Illustrated
  • Dan Rosenheck‘s projections.

In addition, Brian Cartwright, the designer of the Oliver projections, shared his 2013 data with me. Thanks, Brian!
Many of these sources have fewer data fields than the other sources I have, with some not providing the raw data from which to compute average, ERA, or WHIP. If a source didn’t provide AB or IP, I used the AB or IP from the consensus of the five sources I had last year. I then computed hits from average for batters, ER from ERA for pitchers, and hits/walks from WHIP (if a source had neither, I simply assumed 3 hits per walk, and derived both from WHIP).
I’m also including the same sources I had before:

In addition, I’m keeping the same Consensus forecast I had previously (a simple average of the 5 sources above), and adding a new AllConsensus, that is a simple average of all 17 systems I’m evaluating here.
As before I’ll be showing two tables per statistic, one comparing only those players projected by all systems, and another assuming a rate value for players not projected by a system relative to that system’s average. Since there’s not enough data to compute wOBA for each source (or SLG, or OBP), I’m just going to do batting average on offense here, and I’ll subtract 0.020 from the source’s overall average for any missing players.
For pitching, I’ll be able to do ERA and WHIP for each source, adding 0.50 to average ERA and 0.10 to average WHIP for missing players.
I’m still computing MAE and RMSE for each source, with bias-adjusted errors. A source is graded by how well it projects players relative to its own average, not by how well its average wins up matching the actual average. For people using projections in fantasy analysis, this is what you’d want: the player’s ranking depends on their relative, not absolute, statistics.
So let’s get to the numbers!
Now there are only 220 players projected by all systems:

Source Num Avg Avg MAE RMSE
Actual 220 0.2654 0.0000 0.0000
AllConsensus 220 0.2688 0.0203 0.0257
Aggpro 220 0.2684 0.0202 0.0257
Oliver 220 0.2665 0.0202 0.0257
Steamer/Razzball 220 0.2670 0.0204 0.0258
Steamer 220 0.2670 0.0204 0.0258
Rosenheck 220 0.2698 0.0206 0.0259
FangraphsFans 220 0.2745 0.0205 0.0260
Consensus 220 0.2683 0.0206 0.0260
ZiPS 220 0.2643 0.0201 0.0260
RotoValue 220 0.2668 0.0207 0.0264
BaseballGuru 220 0.2571 0.0205 0.0264
CAIRO 220 0.2653 0.0211 0.0264
SI 220 0.2723 0.0208 0.0266
Marcel 220 0.2712 0.0210 0.0266
ESPN 220 0.2717 0.0213 0.0269
MORPS 220 0.2713 0.0216 0.0272
Razzball 220 0.2701 0.0214 0.0274
CBS 220 0.2733 0.0222 0.0281
RotoChamp 220 0.2781 0.0217 0.0283
Will Larson 220 0.2637 0.0218 0.0286
y2012 220 0.2706 0.0245 0.0315

Here AllConsensus, AggPro, and Oliver are in a virtual tie for the lowest errors, with Steamer next (it’s nice to see Steamer and Steamer/Razzball so close). Even the highest errors are still just under .003 more than the lowest, so all these systems do a good job of projecting the top batters. I’m glad to see my own model now performing a little better than the median!
Now let’s assume missing players hit 20 points worse than a system’s average:

Source Num Avg MLB Avg StdDev MAE RMSE Missing
Actual 633 0.2570 633 0.2570 0.0364 0.0000 0.0000 0
Steamer 608 0.2621 633 0.2609 0.0193 0.0231 0.0308 110
AllConsensus 1514 0.2392 633 0.2612 0.0205 0.0230 0.0309 15
Steamer/Razzball 504 0.2633 633 0.2608 0.0193 0.0232 0.0310 162
Oliver 1445 0.2388 633 0.2588 0.0211 0.0230 0.0310 23
Consensus 786 0.2612 633 0.2621 0.0192 0.0234 0.0314 80
FangraphsFans 331 0.2706 633 0.2671 0.0192 0.0234 0.0314 310
ZiPS 1000 0.2452 633 0.2575 0.0205 0.0235 0.0315 29
CAIRO 507 0.2606 633 0.2587 0.0207 0.0240 0.0316 177
Aggpro 302 0.2658 633 0.2608 0.0186 0.0238 0.0317 336
Marcel 750 0.2596 633 0.2640 0.0203 0.0241 0.0322 105
SI 352 0.2690 633 0.2654 0.0194 0.0239 0.0322 290
BaseballGuru 698 0.2476 633 0.2487 0.0218 0.0241 0.0324 109
Razzball 251 0.2697 633 0.2631 0.0198 0.0243 0.0324 390
MORPS 539 0.2633 633 0.2643 0.0206 0.0246 0.0326 163
Will Larson 294 0.2617 633 0.2561 0.0208 0.0249 0.0334 342
RotoChamp 434 0.2737 633 0.2716 0.0219 0.0250 0.0338 230
Rosenheck 414 0.2664 633 0.2626 0.0228 0.0243 0.0338 236
RotoValue 751 0.2600 633 0.2601 0.0236 0.0244 0.0354 105
CBS 751 0.2626 633 0.2619 0.0333 0.0273 0.0389 88
ESPN 569 0.2594 633 0.2596 0.0377 0.0266 0.0417 126
y2012 611 0.2592 633 0.2585 0.0414 0.0318 0.0456 129

This time the error spreads are much wider, with Razzball, one of the weaker sources in the prior table, now leading the pack. As a general rule, sources that project fewer players rank higher here than in the prior table, and indeed the Steamer data highlights this: the Steamer file on Larson’s site projected more players who played in 2013 than the one Jared Cross sent me that I label Steamer/Razzball, and it’s the latter version that ranks higher in this test. AllConsensus, which tied for best before, now drops far back, well behind the smaller consensus which has projections for fewer players.
Having fixed my bug and updated my table, I now see Steamer with the lowest RMSE, closely followed by AllConsensus, Steamer/Razzball, and Oliver, but all the top systems are quite close in this test.
That’s interesting, perhaps suggesting that simply projecting 20 points worse than league average may be better than doing an individual projection for many weaker players. (I assume that players projected by all systems tend to be much better than those not projected by some, and indeed the cumulative batting averages of the different data sets support that assumption). While each systems’ errors rise compared to the subset of all projected players, some systems far much worse here than in the other test, which suggests that most systems can do reasonably well with the top hitters, but what separates a good from a mediocre system is how they do with weaker players.
Now on to pitching. First up, ERA:
84 players projected by all systems

Source Num Avg ERA MAE RMSE
Actual 84 3.6870 0.0000 0.0000
Steamer/Razzball 84 3.8968 0.5718 0.7862
Steamer 84 3.8961 0.5783 0.7879
Aggpro 84 3.9394 0.5568 0.7949
Rosenheck 84 3.8880 0.5585 0.8027
Will Larson 84 3.9940 0.5611 0.8108
AllConsensus 84 3.7522 0.5832 0.8286
Consensus 84 3.7651 0.6055 0.8352
FangraphsFans 84 3.5878 0.6015 0.8414
RotoValue 84 3.7182 0.6053 0.8492
Razzball 84 3.6381 0.6217 0.8572
ESPN 84 3.6788 0.6024 0.8710
CAIRO 84 3.7388 0.6269 0.8738
Marcel 84 3.6714 0.6388 0.8750
RotoChamp 84 3.6447 0.6366 0.8763
Oliver 84 3.5956 0.6343 0.8842
MORPS 84 3.7761 0.6654 0.9016
ZiPS 84 3.7967 0.6512 0.9159
SI 84 3.6444 0.6630 0.9319
CBS 84 3.5727 0.6663 0.9494
BaseballGuru 84 3.8959 0.6936 0.9719
y2012 84 3.6606 0.8278 1.0753

Steamer and Aggpro top this chart, followed by Dan Rosenheck. Will Larson, who ranked last in the parallel batting average table, now places fourth. But there are only 84 pitchers ranked by all systems, and in digging deeper, I found three sources from Larsen, his own projections, SI, and CBS, did not provide projections for top closers at all (or perhaps many relievers generally).
Now let’s see what happens comparing all players who actually played, but adding 0.50 to the league average ERA projection if you didn’t project a player:

Source Num ERA MLB ERA StdDev MAE RMSE Missing
Actual 664 3.8648 664 3.8636 1.3836 0.0000 0.0000 0
Steamer 635 3.9522 664 3.9916 0.5025 0.8835 1.3108 142
Steamer/Razzball 515 3.9583 664 4.0113 0.5030 0.8886 1.3120 198
Consensus 872 3.9740 664 3.9859 0.5066 0.9144 1.3389 113
AllConsensus 1310 4.6060 664 4.0786 0.5994 0.9109 1.3417 31
Aggpro 203 3.9761 664 4.2058 0.4474 0.9082 1.3422 469
FangraphsFans 232 3.6821 664 3.8801 0.4680 0.9150 1.3468 444
RotoChamp 477 3.9016 664 3.9377 0.5894 0.9366 1.3499 244
Razzball 151 3.6199 664 3.9057 0.3891 0.9165 1.3513 522
MORPS 499 3.8988 664 3.9432 0.6057 0.9326 1.3587 226
Marcel 841 3.9947 664 3.9466 0.5032 0.9440 1.3650 136
Oliver 1122 4.4497 664 3.9867 0.6314 0.9523 1.3719 82
CAIRO 489 4.0033 664 4.0743 0.6217 0.9470 1.3774 234
Will Larson 128 4.1284 664 4.4087 0.3672 0.9404 1.3803 538
ESPN 500 3.9256 664 4.0417 0.6351 0.9552 1.3919 233
ZiPS 998 4.7479 664 4.2787 0.7708 0.9533 1.3929 48
Rosenheck 172 4.0618 664 4.2862 0.3898 0.9584 1.4003 498
RotoValue 836 3.9983 664 3.9953 0.6849 0.9681 1.4044 133
SI 211 3.8986 664 4.1051 0.4609 0.9701 1.4087 469
CBS 308 3.9767 664 4.1821 1.1338 1.0573 1.7149 424
BaseballGuru 738 4.5729 664 4.5452 1.7816 1.1384 2.1694 152
y2012 630 3.9905 664 4.0920 1.8927 1.3148 2.1943 168

Now Razzball winds up on top, just ahead of Will Larson, with AggPro a little further back. Even more starkly than with batting average, though, I see that systems projecting fewer players tend to rank much higher in this comparison. Yet again Steamer/Razzball moves well ahead of Steamer, and AllConsensus, which had outperformed Consensus in the prior test, now falls much further behind. Steamer is still on top, followed by the aggregates, the 5 system Consensus, Aggpro, and the Consensus of all sources. Pitching is indeed harder to predict, as we see a much wider spread of errors for ERA than for Avg.
Finally, let’s take a look at WHIP, again first looking at only the 84 pitchers projected by all sources:

Source Num Avg WHIP MAE RMSE
Actual 84 1.2373 0.0000 0.0000
Rosenheck 84 1.2746 0.1002 0.1358
FangraphsFans 84 1.2257 0.1035 0.1364
Steamer 84 1.2741 0.1038 0.1374
Steamer/Razzball 84 1.2738 0.1045 0.1374
Aggpro 84 1.2933 0.1040 0.1386
RotoValue 84 1.2511 0.1029 0.1387
AllConsensus 84 1.2547 0.1074 0.1408
Consensus 84 1.2630 0.1084 0.1422
Will Larson 84 1.2609 0.1035 0.1434
Marcel 84 1.2366 0.1126 0.1442
ESPN 84 1.2492 0.1089 0.1448
MORPS 84 1.2677 0.1100 0.1452
RotoChamp 84 1.2298 0.1135 0.1466
Oliver 84 1.2550 0.1104 0.1504
Razzball 84 1.2304 0.1121 0.1510
BaseballGuru 84 1.2426 0.1169 0.1516
ZiPS 84 1.2603 0.1165 0.1530
SI 84 1.2471 0.1177 0.1536
CBS 84 1.2259 0.1187 0.1582
y2012 84 1.2165 0.1324 0.1665
CAIRO 84 1.2790 0.1355 0.1774

Here Fangraphs’ Fans Dan Rosenheck’s projections take the top spot, followed by Fangraphs Fans, Steamer, and then Aggpro and my own RotoValue projections (yay!) also bettering the two consensus forecasts.
When I fill in league average plus 0.10 for missing players, we again find sources that projected few players tending to rise to the top  Rosenheck drops significantly, but the top systems remain the same:

Source Num WHIP MLB WHIP StdDev MAE RMSE Missing
Actual 664 1.2995 664 1.2993 0.2354 0.0000 0.0000 0
FangraphsFans 232 1.2488 664 1.2877 0.0868 0.1535 0.2174 444
Steamer/Razzball 515 1.3139 664 1.3250 0.0895 0.1522 0.2176 198
Steamer 635 1.3163 664 1.3239 0.0895 0.1518 0.2183 142
Aggpro 203 1.3124 664 1.3582 0.0804 0.1558 0.2207 469
AllConsensus 1310 1.4425 664 1.3365 0.1264 0.1581 0.2228 31
Marcel 841 1.3055 664 1.2952 0.0911 0.1589 0.2232 136
Consensus 872 1.3231 664 1.3280 0.1101 0.1579 0.2233 113
RotoValue 836 1.3136 664 1.3097 0.1100 0.1592 0.2246 133
Razzball 151 1.2292 664 1.2868 0.0804 0.1580 0.2252 522
Oliver 1122 1.4335 664 1.3435 0.1327 0.1643 0.2278 82
Will Larson 128 1.2967 664 1.3531 0.0895 0.1617 0.2307 538
MORPS 499 1.3129 664 1.3226 0.1335 0.1626 0.2307 226
Rosenheck 172 1.3099 664 1.3549 0.0808 0.1615 0.2310 498
RotoChamp 477 1.2978 664 1.3029 0.1378 0.1691 0.2312 244
SI 211 1.2976 664 1.3400 0.0923 0.1658 0.2320 469
ESPN 500 1.3046 664 1.3257 0.1321 0.1665 0.2323 233
BaseballGuru 738 1.2990 664 1.3151 0.1236 0.1697 0.2359 152
ZiPS 998 1.4692 664 1.3706 0.1587 0.1710 0.2360 48
CAIRO 489 1.3720 664 1.3874 0.1733 0.1864 0.2529 234
CBS 308 1.3011 664 1.3474 0.2360 0.1842 0.3102 424
y2012 630 1.3049 664 1.3219 0.3179 0.2175 0.3630 168

Once again Razzball and Will Larson rise to the top, the Steamer/Razzball ranks higher than Steamer, Consensus bests AllConsensus, and in general it seems sources that project fewer players tend to rank higher in this comparison.
I’m still trying to think about what the results mean, but it does seem that for weaker players, projections systems may not provide much value added compared to simply guessing something a little worse than league average. Or maybe projections that revert to an overall league average in a statistic pull the projected values too high systematically for weaker players. For now I want to present the data and find out what others think about this, but in terms of evaluating forecasters, I think the comparison of players projected by all systems is probably more informative than the other one (disclaimer: my own system does rank higher that way, but so do the consensus forecasts, and I do sincerely think missing quite a few players may be a relative advantage somehow).
As has been seen before, projections are more accurate at forecasting batting stats than pitching stats, and virtually every system out there does much better than simply using 2012 data. Comments and feedback welcome, in comments below, or you can e-mail me: geoff at rotovalue dot com.
While I did try my best to be accurate, it is possible that I made some mistakes in uploading the data, trying to match names in the files with players in my own database. I conclude by thanking Dr. Larson for compiling and making so much data available, Brian Cartwright, for sharing Oliver with me, and Tom Tango, for pointing me to Dr. Larson’s site and his feedback on the methodology.