I’ve previously compared five MLB projection systems for batting and pitching rate statistics, comparing the projections for 2012 with actual data. This post will compare 4 systems by looking at the RotoValue prices for a given projection stat set and league setup compared with actual 2012 data for the same setup. The four systems I’m testing will be:
- CAIRO – from S B of the Replacement Level Yankees Weblog.
- Marcel – the basic projections from Tom Tango, coauthor of The Book.
- Steamer – developed by Jared Cross, Dash Davidson, and Peter Rosenbloom.
- RotoValue – my own old projection algorithm.
Conspicuously absent is the ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN, which I included in the other comparisons. I’ve left ZiPS out, however, because that system makes no serious attempt to project playing time, and it also (or at least the version I got from last season) did not project pitcher saves. This leaves that system at a marked disadvantage relative to other systems, so I thought it best to omit it.
I’m also including unadjusted 2011 data as another model.
From a fantasy perspective, a big use of projections is to help value players and determine how much one should spend on them. While projecting rate stats well is a better indicator of how good a projection system is at reflecting a player’s current talent level, a fantasy owner cares both about talent level and playing time. RotoValue prices are designed to compare players’ contributions to a fantasy team given its league parameters, so they’re a good shorthand way to combine both skill and playing time into a single number.
So let’s take a look…
I ran numbers for the 5 different league configurations I highlighted when announcing my first cut projections for 2013.
First up, a 4×4 AL only league. The first table shows averages for each player for whom the projection system had some nonzero stats projected. The second table shows the averages only among players projected by every system being compared (counting 2011 as a projection system).
Source | Num | Price | StdDev | MAE | RMSE |
---|---|---|---|---|---|
2012 | 622 | 0.832 | 10.044 | 0.000 | 0.000 |
Steamer | 576 | 1.464 | 9.956 | 5.880 | 8.103 |
Marcel | 507 | 2.705 | 10.145 | 6.581 | 9.100 |
2011 | 484 | 3.352 | 10.064 | 7.007 | 9.489 |
RotoValue | 406 | 4.832 | 9.888 | 7.338 | 9.669 |
CAIRO | 584 | 1.112 | 10.525 | 7.293 | 9.956 |
382 players projected by all systems
Source | Num | Avg Price | MAE | RMSE |
---|---|---|---|---|
2012 | 382 | 3.717 | 0.000 | 0.000 |
Steamer | 382 | 4.895 | 6.469 | 8.684 |
Marcel | 382 | 5.031 | 7.287 | 9.611 |
RotoValue | 382 | 5.289 | 7.510 | 9.846 |
2011 | 382 | 5.113 | 7.606 | 9.885 |
CAIRO | 382 | 3.580 | 7.995 | 10.648 |
My first observation is that the errors in the second table are all much higher than in the first table, which does make sense to me. While the RotoValue model does allow for negative prices, indeed rather large ones, there is less variation among players who play very little than among those who play often. So projecting more players implies projecting those expected to get less time, which improves the overall average errors. This also matches the changes in RMSE, with CAIRO, which projected the most players, rising the most when I test the smaller set, and RotoValue, which projected the fewest players, rising the least.
I think the second table puts systems on a more equal footing (although systems that project players other systems don’t do not get credit for that). Steamer, which had the lowest overall errors in projecting pitching percentage stats, does best here, too.
Next, a 4×4 NL league:
Source | Num | Price | StdDev | MAE | RMSE |
---|---|---|---|---|---|
2012 | 707 | -0.588 | 10.138 | 0.000 | 0.000 |
Steamer | 646 | 0.195 | 10.233 | 5.899 | 8.270 |
Marcel | 575 | 1.499 | 10.269 | 6.526 | 9.201 |
2011 | 536 | 1.757 | 10.802 | 6.999 | 9.670 |
RotoValue | 474 | 3.195 | 10.349 | 7.310 | 9.965 |
CAIRO | 656 | -0.358 | 11.095 | 7.502 | 10.051 |
455 players projected by all systems
Source | Num | Avg Price | MAE | RMSE |
---|---|---|---|---|
2012 | 455 | 2.181 | 0.000 | 0.000 |
Steamer | 455 | 3.190 | 6.646 | 9.075 |
Marcel | 455 | 3.500 | 7.229 | 9.986 |
RotoValue | 455 | 3.409 | 7.391 | 10.018 |
2011 | 455 | 3.261 | 7.577 | 10.280 |
CAIRO | 455 | 1.518 | 7.879 | 10.647 |
Again projecting more total players gives a lower overall error, so comparing among players projected by all systems gives a better look. This pattern exists in all formats.
Now a 5×5 AL league:
Source | Num | Price | StdDev | MAE | RMSE |
---|---|---|---|---|---|
2012 | 622 | 0.437 | 10.205 | 0.000 | 0.000 |
Steamer | 576 | 0.702 | 10.559 | 5.729 | 7.727 |
Marcel | 507 | 2.623 | 10.051 | 6.308 | 8.600 |
RotoValue | 406 | 4.805 | 9.698 | 7.025 | 9.045 |
2011 | 484 | 3.107 | 10.117 | 6.715 | 9.049 |
CAIRO | 584 | 1.172 | 10.264 | 7.119 | 9.483 |
382 players projected by all systems
Source | Num | Avg Price | MAE | RMSE |
---|---|---|---|---|
2012 | 382 | 3.757 | 0.000 | 0.000 |
Steamer | 382 | 4.639 | 6.088 | 8.048 |
Marcel | 382 | 5.130 | 6.956 | 9.016 |
RotoValue | 382 | 5.310 | 7.137 | 9.189 |
2011 | 382 | 5.067 | 7.240 | 9.353 |
CAIRO | 382 | 3.266 | 7.646 | 9.969 |
And a 5×5 NL League:
Source | Num | Price | StdDev | MAE | RMSE |
---|---|---|---|---|---|
2012 | 707 | -1.122 | 10.404 | 0.000 | 0.000 |
Steamer | 646 | -0.348 | 10.460 | 5.678 | 7.901 |
Marcel | 575 | 1.283 | 10.255 | 6.375 | 8.804 |
2011 | 536 | 1.496 | 10.845 | 6.683 | 9.237 |
RotoValue | 474 | 3.031 | 10.338 | 7.106 | 9.580 |
CAIRO | 656 | -0.983 | 11.412 | 7.860 | 10.419 |
455 players projected by all systems
Source | Num | Avg Price | MAE | RMSE |
---|---|---|---|---|
2012 | 455 | 1.981 | 0.000 | 0.000 |
Steamer | 455 | 2.979 | 6.261 | 8.516 |
Marcel | 455 | 3.362 | 7.042 | 9.517 |
RotoValue | 455 | 3.259 | 7.192 | 9.659 |
2011 | 455 | 3.076 | 7.197 | 9.798 |
CAIRO | 455 | 0.473 | 8.102 | 10.844 |
And finally, a 5×5 mixed league:
Source | Num | Price | StdDev | MAE | RMSE |
---|---|---|---|---|---|
2012 | 1309 | -12.014 | 13.278 | 0.000 | 0.000 |
Steamer | 1202 | -12.939 | 14.912 | 8.066 | 10.796 |
Marcel | 1064 | -7.637 | 12.006 | 8.514 | 11.230 |
2011 | 1004 | -6.590 | 11.818 | 8.977 | 11.692 |
RotoValue | 863 | -7.053 | 13.294 | 9.096 | 12.184 |
CAIRO | 1220 | -9.482 | 12.374 | 9.198 | 12.322 |
822 players projected by all systems
Source | Num | Avg Price | MAE | RMSE |
---|---|---|---|---|
2012 | 822 | -8.014 | 0.000 | 0.000 |
Steamer | 822 | -7.907 | 8.442 | 11.275 |
Marcel | 822 | -5.020 | 9.331 | 11.998 |
2011 | 822 | -4.644 | 9.565 | 12.238 |
CAIRO | 822 | -7.541 | 9.283 | 12.284 |
RotoValue | 822 | -6.630 | 9.244 | 12.327 |
When looking at those players projected by all systems, Steamer had the lowest RMSE and MAE in every format, while CAIRO was the highest in each format except the 5×5 MLB. One difference with that format is that it’s a much shallower league, as it uses 270 230 players total out of both leagues, whereas the NL leagues use 270 230 NL only, and the AL uses 280 240 total (one extra DH per fantasy team). So the replacement level for the MLB league is much higher, and that affects pricing (explaining why the average price becomes negative in that format).
CAIRO did much better in projecting rate stats than it did in projecting RotoValue prices, while my old RotoValue model, which was quite poor in projecting rate stats, usually had lower errors than CAIRO in this format. I suspect this may be due in large part to playing time projections. One tweak I did with my model was to adjust players’ projected stats relative to known, preseason injuries. So Ryan Howard, who wasn’t expected back until at least mid May last year, was only projected for 363 AB in my system, compared to 516 for Marcel and 575 for CAIRO. So the RotoValue prices for him in those projection systems were much higher than mine, and thus even if on average RotoValue’s percentages were worse than other systems, it made up for some (or even all) of that gap by being closer on playing time. Steamer also had Howard projected for just 394 AB, and thus was less hurt by his poor year (just 260 AB, but also only a .301 wOBA for the time he played). Another Phillie with a preseason injury, Chase Utley, followed a similar pattern: CAIRO projected 381 AB, Marcel 412, and Steamer 494, but RotoValue only had him pegged for 275. CAIRO also was hurt by having the most optimistic projections for stars who disappointed, like Roy Halladay, Mariano Rivera and Chris Carpenter, while its projection for Buster Posey, who rebounded spectacularly from a season-ending injury in 2011, was the least optimistic.
I should add plenty of caveats about drawing overly strong conclusions from limited data. While I did compare prices five different scoring systems, it was only based on 2012 projection data. Steamer performed better than than the other systems in all formats, and Marcel was usually the second best. But the gap in error rates between Marcel and other systems was much smaller than that between it and Steamer. My old projection model looks better in this sort of comparison than in earlier comparisons, which suggests that the playing time adjustments I do added value. So I’ll be curious to see how my modified model performs in 2013.
Update 2/8/2013: I’ve rerun the tables after double-checking the league parameters that I was using. My intention was to use a $0 minimum bid and no bench players, because that removes a potential discontinuity in prices that would occur in the pricing model (it assumes all projected bench players are worth only the minimum price, and it does not allow prices between 0 and the minimum bid). The particular error numbers and price values are different, but the overall comparisons and orderings are essentially the same.
It would be interesting to adjust each projection set to the same amount of plate appearances/IP for each player (ie: scale each set to 650 PA for Bourn, 500 for Nelson Cruz, etc). Maybe use the Fangraphs crowd-sourced playing time predictions for all systems. You could then include ZiPS in the comparison for hitters. This would judge how well they forecast talent instead of just how well they forecast a mix of talent and playing time. Less variables = better. I think this is different enough from using simple rate stats like wOBA since stats like R/RBI/SV/W are team/situation dependent. Also, I doubt many people use out-of-the-box projections without adjusting for playing time. At least, not many people that are reading this series of articles in early February.
Yeah – someone also suggested augmenting ZiPS with save projections from somewhere else. One way to grade projections for fantasy purposes but removing playing time would be to scale projections to actual playing time, and then grade. As a retrospective test I like it quite a bit, and I already have the raw data to do that (I don’t currently have stand-alone playing time projection data in my database).
I’ve done a post normalizing playing time for all systems to be actual 2012 data, and also adding in the average of the other forecasts’ saves to ZiPS to include it.
http://blog.rotovalue.com/playing-time-neutral-projection-comparison/