Update 7 February 2014: I’ve updated the post below to add an additional source, Rosenheck, and to correct a bug in my code generating the table for missing players. The original table incorrectly was computing RMSE and MAE of only the players projected by a system without including missing players at all, leaving a very much apples-to-oranges comparison: systems projecting fewer players tended to project only better players, who are more predictable, and hence those showed much lower RMSE. Now that I actually am including missing points, all systems do have higher errors than in the earlier table, and so the biggest apparent bias was from my own bug. Also, I’ve removed blank projections from BaseballGuru, which caused that to show higher errors in the missing player case, since blank “projections” are for a 0.00 ERA or a .000 batting average.
Tom Tango kindly highlighted my previous posts reviewing the 2013 forecast data that was available on the RotoValue site last year, and in the comments, he pointed me to a data set put online by Will Larson that has 2013 projection data from a dozen sources.
Dr. Larson kindly allows people to write articles using his data so long as they let him know and cite him as the source. I’ve downloaded projections from 12 sources from his site, http://www.bbprojectionproject.com and am including these additional sources in the comparisons for this article:
- AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
- Steamer – A system from Jared Cross, Dash Davidson, and Peter Rosenbloom.1
- Will Larson – Will Larson has projections in his own name also.
- Zips – A projection model from Dan Szymborski.
- CBS Sportsline
- Razzball – Projections from Grey Albright of Razzball.com, distinct from the Steamer/Razzball collaboration I have evaluated previously.
- Fangraphs Fans
- Sports Illustrated
- Dan Rosenheck‘s projections.
Many of these sources have fewer data fields than the other sources I have, with some not providing the raw data from which to compute average, ERA, or WHIP. If a source didn’t provide AB or IP, I used the AB or IP from the consensus of the five sources I had last year. I then computed hits from average for batters, ER from ERA for pitchers, and hits/walks from WHIP (if a source had neither, I simply assumed 3 hits per walk, and derived both from WHIP).
I’m also including the same sources I had before:
- CAIRO – from S B of the Replacement Level Yankees Weblog.
- Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
- MORPS – A projection model by Tim Oberschlake.
- Steamer/Razzball – Rate projections by Jared Cross, Dash Davidson, and Peter Rosenbloom, and playing time projections from Rudy Gamble of Razzball.com.
- RotoValue – my current model, based largely on Marcel, but with adjustments for pitching decision stats and assuming no pitcher skill in BABIP.
In addition, I’m keeping the same Consensus forecast I had previously (a simple average of the 5 sources above), and adding a new AllConsensus, that is a simple average of all 17 systems I’m evaluating here.
As before I’ll be showing two tables per statistic, one comparing only those players projected by all systems, and another assuming a rate value for players not projected by a system relative to that system’s average. Since there’s not enough data to compute wOBA for each source (or SLG, or OBP), I’m just going to do batting average on offense here, and I’ll subtract 0.020 from the source’s overall average for any missing players.
For pitching, I’ll be able to do ERA and WHIP for each source, adding 0.50 to average ERA and 0.10 to average WHIP for missing players.
I’m still computing MAE and RMSE for each source, with bias-adjusted errors. A source is graded by how well it projects players relative to its own average, not by how well its average wins up matching the actual average. For people using projections in fantasy analysis, this is what you’d want: the player’s ranking depends on their relative, not absolute, statistics.
So let’s get to the numbers!
Now there are only 220 players projected by all systems:
Here AllConsensus, AggPro, and Oliver are in a virtual tie for the lowest errors, with Steamer next (it’s nice to see Steamer and Steamer/Razzball so close). Even the highest errors are still just under .003 more than the lowest, so all these systems do a good job of projecting the top batters. I’m glad to see my own model now performing a little better than the median!
Now let’s assume missing players hit 20 points worse than a system’s average:
This time the error spreads are much wider,
with Razzball, one of the weaker sources in the prior table, now leading the pack. As a general rule, sources that project fewer players rank higher here than in the prior table, and indeed the Steamer data highlights this: the Steamer file on Larson’s site projected more players who played in 2013 than the one Jared Cross sent me that I label Steamer/Razzball, and it’s the latter version that ranks higher in this test. AllConsensus, which tied for best before, now drops far back, well behind the smaller consensus which has projections for fewer players.
Having fixed my bug and updated my table, I now see Steamer with the lowest RMSE, closely followed by AllConsensus, Steamer/Razzball, and Oliver, but all the top systems are quite close in this test.
That’s interesting, perhaps suggesting that simply projecting 20 points worse than league average may be better than doing an individual projection for many weaker players. (I assume that players projected by all systems tend to be much better than those not projected by some, and indeed the cumulative batting averages of the different data sets support that assumption). While each systems’ errors rise compared to the subset of all projected players, some systems far much worse here than in the other test, which suggests that most systems can do reasonably well with the top hitters, but what separates a good from a mediocre system is how they do with weaker players.
Now on to pitching. First up, ERA:
84 players projected by all systems
Steamer and Aggpro top this chart, followed by Dan Rosenheck. Will Larson, who ranked last in the parallel batting average table, now places fourth. But there are only 84 pitchers ranked by all systems, and in digging deeper, I found three sources from Larsen, his own projections, SI, and CBS, did not provide projections for top closers at all (or perhaps many relievers generally).
Now let’s see what happens comparing all players who actually played, but adding 0.50 to the league average ERA projection if you didn’t project a player:
|y2012||630||3.9905||664||4.0920||1.8927||1.3148||2.1943||168|Now Razzball winds up on top, just ahead of Will Larson, with AggPro a little further back. Even more starkly than with batting average, though, I see that systems projecting fewer players tend to rank much higher in this comparison. Yet again Steamer/Razzball moves well ahead of Steamer, and AllConsensus, which had outperformed Consensus in the prior test, now falls much further behind. Steamer is still on top, followed by the aggregates, the 5 system Consensus, Aggpro, and the Consensus of all sources. Pitching is indeed harder to predict, as we see a much wider spread of errors for ERA than for Avg.
Finally, let’s take a look at WHIP, again first looking at only the 84 pitchers projected by all sources:
Fangraphs’ Fans Dan Rosenheck’s projections take the top spot, followed by Fangraphs Fans, Steamer, and then Aggpro and my own RotoValue projections (yay!) also bettering the two consensus forecasts.
When I fill in league average plus 0.10 for missing players,
we again find sources that projected few players tending to rise to the top Rosenheck drops significantly, but the top systems remain the same:
|y2012||630||1.3049||664||1.3219||0.3179||0.2175||0.3630||168|Once again Razzball and Will Larson rise to the top, the Steamer/Razzball ranks higher than Steamer, Consensus bests AllConsensus, and in general it seems sources that project fewer players tend to rank higher in this comparison. I’m still trying to think about what the results mean, but it does seem that for weaker players, projections systems may not provide much value added compared to simply guessing something a little worse than league average. Or maybe projections that revert to an overall league average in a statistic pull the projected values too high systematically for weaker players. For now I want to present the data and find out what others think about this, but in terms of evaluating forecasters, I think the comparison of players projected by all systems is probably more informative than the other one (disclaimer: my own system does rank higher that way, but so do the consensus forecasts, and I do sincerely think missing quite a few players may be a relative advantage somehow).
As has been seen before, projections are more accurate at forecasting batting stats than pitching stats, and virtually every system out there does much better than simply using 2012 data. Comments and feedback welcome, in comments below, or you can e-mail me: geoff at rotovalue dot com.
While I did try my best to be accurate, it is possible that I made some mistakes in uploading the data, trying to match names in the files with players in my own database. I conclude by thanking Dr. Larson for compiling and making so much data available, Brian Cartwright, for sharing Oliver with me, and Tom Tango, for pointing me to Dr. Larson’s site and his feedback on the methodology.