Yesterday I ran comparisons of several projections systems for an all-inclusive batting statistic, wOBA. Today I’m running the same tests, computing root mean square error (RMSE) and mean absolute error (MAE), for two commonly used fantasy statistics, ERA and WHIP. These tests are bias-adjusted, so what matters is a player’s ERA or WHIP relative to the overall average of that system, compared with the player’s actual statistic relative to the actual overall average. The lower the RMSE or MAE, the better a projection system predicted the actual data.
I have data for these projection models:
- AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
- Bayesball – Projections from Jonathan Adams.
- CAIRO – from S B of the Replacement Level Yankees Weblog.
- CBS Projections from CBS Sportsline.
- Davenport Clay Davenport’s projections.
- ESPN Projections from ESPN.
- Fans Fans’ projections from Fangraphs.com.
- Larson Will Larson’s projections.
- Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
- MORPS – A projection model by Tim Oberschlake.
- Rosenheck Projections by Dan Rosenheck.
- Oliver – Brian Cartwright’s projection model.
- Steamer – Projections by Jared Cross, Dash Davidson, and Peter Rosenbloom.
- Steamer/Razzball – Steamer rate projections, but playing time projections from Rudy Gamble of Razzball.com.
- RotoValue – my current model, based largely on Marcel, but with adjustments forpitching decision stats and assuming no pitcher skill in BABIP.
- RV Pre-Australia – The RotoValue projections taken just before the first Australia games last year. Before the rest of the regular season I continued to tweak projections slightly.
- ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.
First up is ERA, comparing the 75 pitchers projected by all systems:
Source | Num | Avg ERA | MAE | RMSE |
---|---|---|---|---|
Actual | 75 | 3.4261 | 0.0000 | 0.0000 |
Larson | 75 | 3.7608 | 0.5834 | 0.7644 |
All Consensus | 75 | 3.6435 | 0.5780 | 0.7693 |
CBS | 75 | 3.5056 | 0.5723 | 0.7700 |
Steamer | 75 | 3.8116 | 0.6015 | 0.7781 |
Zips | 75 | 3.5525 | 0.5873 | 0.7868 |
ESPN | 75 | 3.7014 | 0.5878 | 0.7882 |
RotoChamp | 75 | 3.5329 | 0.5981 | 0.7886 |
Steamer/Razzball | 75 | 3.8057 | 0.6164 | 0.7905 |
RotoGuru | 75 | 3.8089 | 0.5713 | 0.8011 |
RV Pre-Australia | 75 | 3.6518 | 0.6038 | 0.8037 |
Fans | 75 | 3.4440 | 0.6275 | 0.8047 |
RotoValue | 75 | 3.6512 | 0.6052 | 0.8049 |
Marcel | 75 | 3.5315 | 0.6179 | 0.8066 |
Oliver | 75 | 3.5351 | 0.6159 | 0.8222 |
Razzball | 75 | 3.4998 | 0.6469 | 0.8231 |
Davenport | 75 | 3.5964 | 0.6320 | 0.8270 |
CAIRO | 75 | 3.6321 | 0.6367 | 0.8300 |
AggPro | 75 | 3.7284 | 0.6879 | 0.8411 |
MORPS | 75 | 3.6513 | 0.6742 | 0.8525 |
Bayesball | 75 | 3.9120 | 0.7358 | 0.8876 |
y2013 | 75 | 3.3790 | 0.7074 | 0.9554 |
Not surprisingly the errors, even as a percentage of the average, are much higher here than for wOBA, as pitching performance is more volatile than batting performance. Will Larson’s projections did best here, followed by the Consensus, CBS Sportsline, and Steamer. All the models handily beat using 2013 data, albeit not as decisively as with wOBA, but but seven lagged behind Tango’s Marcel system. The other notable thing to me is that the average ERA for these players of every system is higher than the actual. Indeed 2013 actual data was slightly lower, so this shows how dependent systems are on older historical data. Aggregate ERA has fallen sharply since 2012, and the projection models are still reflecting that higher run environment to a large degree.
When you’re preparing for a fantasy auction, you care much more about how a system rates players relative to each other than how it rates them compared to the actual data. For these 75 pitchers, Steamer projected an aggregate 3.81 ERA, while they actually had a 3.43 ERA (weighted by actual 2014 innings pitched). But Steamer still ranks lower in errors than Fangraphs Fans’ projections, which came closest to the actual average of 3.44. This is the impact of bias adjustment: while the fans did better at projecting the actual league average ERA, Steamer tended to be closer on more individual players when adjusted for its league average.
To underscore that point, here’s the same ERA table as above, but this time doing raw errors, i.e. not doing any bias adjustment at all:
Source | Num | Avg ERA | MAE | RMSE |
---|---|---|---|---|
Actual | 75 | 3.4261 | 0.0000 | 0.0000 |
CBS | 75 | 3.5056 | 0.5900 | 0.7739 |
Zips | 75 | 3.5525 | 0.6152 | 0.7820 |
Fans | 75 | 3.4440 | 0.5901 | 0.7823 |
Razzball | 75 | 3.4998 | 0.5887 | 0.7834 |
RotoChamp | 75 | 3.5329 | 0.6043 | 0.7923 |
All Consensus | 75 | 3.6435 | 0.6250 | 0.7993 |
Marcel | 75 | 3.5315 | 0.6184 | 0.8069 |
Oliver | 75 | 3.5351 | 0.6375 | 0.8271 |
AggPro | 75 | 3.7284 | 0.6699 | 0.8273 |
ESPN | 75 | 3.7014 | 0.6477 | 0.8278 |
RV Pre-Australia | 75 | 3.6518 | 0.6385 | 0.8314 |
RotoValue | 75 | 3.6512 | 0.6384 | 0.8317 |
Larson | 75 | 3.7608 | 0.6769 | 0.8336 |
Davenport | 75 | 3.5964 | 0.6460 | 0.8394 |
CAIRO | 75 | 3.6321 | 0.6812 | 0.8546 |
Steamer/Razzball | 75 | 3.8057 | 0.7031 | 0.8577 |
Steamer | 75 | 3.8116 | 0.7054 | 0.8592 |
MORPS | 75 | 3.6513 | 0.6995 | 0.8711 |
RotoGuru | 75 | 3.8089 | 0.7216 | 0.8758 |
y2013 | 75 | 3.3790 | 0.7054 | 0.9537 |
Bayesball | 75 | 3.9120 | 0.8315 | 0.9704 |
Now Fans ranks well above Steamer, which is near the bottom in this test. The lower errors here tend to come from systems that come closer to matching the actual league ERA. Even the naive 2013 data as a forecast now is no longer dead last by a wide margin, while CBS, Zips, and Fans are the best performing systems here. Comparing these two tables shows the impact of bias adjustment. What matters most for fantasy valuation is how players compare relative to each other, and not how well some system predicts the actual run environment players are in. So the first table is a better comparison of projections for fantasy purposes.
Both these tables are “apples to apples”, comparing only those players that each system projected. And it’s a small set of overall better than average pitchers, a group which overall is easier to project than a deeper set of MLB pitchers.
But of course if you’re in a fantasy league, it doesn’t help when a system doesn’t project someone you might care about. So this next table will use an ERA of 0.50 worse than the system’s league average for anybody not projected, and compare against a set of almost 700 pitchers:
Source | MLB | ERA | StdDev | MAE | RMSE | Missing |
---|---|---|---|---|---|---|
Actual | 672 | 3.7395 | 1.4707 | 0.0000 | 0.0000 | 0 |
Steamer/Razzball | 672 | 4.0077 | 0.5245 | 0.8759 | 1.4168 | 215 |
All Consensus | 672 | 3.9776 | 0.5452 | 0.8773 | 1.4251 | 4 |
Davenport | 672 | 3.8822 | 0.4985 | 0.8856 | 1.4270 | 198 |
Steamer | 672 | 4.0087 | 0.5731 | 0.8883 | 1.4280 | 37 |
Oliver | 672 | 3.9382 | 0.6424 | 0.9090 | 1.4418 | 31 |
Bayesball | 672 | 4.0215 | 0.5219 | 0.9252 | 1.4427 | 222 |
Fans | 672 | 3.7886 | 0.4485 | 0.9017 | 1.4473 | 465 |
AggPro | 672 | 4.0252 | 0.3698 | 0.9071 | 1.4483 | 557 |
Razzball | 672 | 3.8558 | 0.3791 | 0.9045 | 1.4485 | 524 |
RotoValue | 672 | 3.8888 | 0.4810 | 0.9093 | 1.4488 | 32 |
RV Pre-Australia | 672 | 3.9020 | 0.4852 | 0.9080 | 1.4505 | 42 |
Larson | 672 | 4.2586 | 0.4348 | 0.9232 | 1.4540 | 463 |
ESPN | 672 | 4.0331 | 0.5404 | 0.9056 | 1.4563 | 336 |
Marcel | 672 | 3.7990 | 0.4901 | 0.9103 | 1.4587 | 136 |
Zips | 672 | 4.0703 | 0.7450 | 0.9207 | 1.4640 | 69 |
CAIRO | 672 | 4.0489 | 0.6991 | 0.9369 | 1.4758 | 94 |
CBS | 672 | 4.0622 | 0.4768 | 0.9404 | 1.4799 | 462 |
MORPS | 672 | 3.8850 | 0.6547 | 0.9492 | 1.4907 | 143 |
RotoGuru | 672 | 4.3407 | 0.6741 | 0.9486 | 1.5056 | 150 |
RotoChamp | 672 | 3.8623 | 0.6814 | 0.9398 | 1.5199 | 249 |
y2013 | 672 | 3.8355 | 1.7932 | 1.2610 | 2.2525 | 169 |
Here systems get more credit for projecting more players, so long as those projections are better than the default 0.50 worse than average. This shakes up the order quite a bit. Now Steamer/Razzball does best, followed by the Consensus, Clay Davenport, and Steamer. Will Larson, the winner of the test of fewer (and overall better) players drops to the middle of the pack, while CBS Sportsline and Zips now slip behind Marcel. Bayesball, which ranked worst in the earlier test, improves markedly. The overall errors increase quite a bit, as we’re now comparing projections for a much deeper set of players, many of whom have much less of a track record.
Finally, let’s take a look at WHIP. First, the “apples-to-apples” table comparing only the 75 pitchers projected by all systems:
Source | Num | Avg WHIP | MAE | RMSE |
---|---|---|---|---|
Actual | 75 | 1.2036 | 0.0000 | 0.0000 |
CBS | 75 | 1.2077 | 0.0874 | 0.1175 |
All Consensus | 75 | 1.2360 | 0.0899 | 0.1205 |
Zips | 75 | 1.2214 | 0.0943 | 0.1229 |
ESPN | 75 | 1.2397 | 0.0963 | 0.1231 |
Larson | 75 | 1.2562 | 0.0954 | 0.1239 |
Steamer | 75 | 1.2616 | 0.0963 | 0.1246 |
Marcel | 75 | 1.2238 | 0.0970 | 0.1254 |
Steamer/Razzball | 75 | 1.2608 | 0.0996 | 0.1271 |
Fans | 75 | 1.2030 | 0.0993 | 0.1271 |
RotoGuru | 75 | 1.2364 | 0.0967 | 0.1272 |
Davenport | 75 | 1.2487 | 0.0981 | 0.1284 |
RV Pre-Australia | 75 | 1.2276 | 0.0977 | 0.1286 |
RotoValue | 75 | 1.2275 | 0.0979 | 0.1288 |
Oliver | 75 | 1.2456 | 0.0978 | 0.1303 |
RotoChamp | 75 | 1.2090 | 0.1069 | 0.1335 |
MORPS | 75 | 1.2464 | 0.1047 | 0.1336 |
Razzball | 75 | 1.2012 | 0.1108 | 0.1388 |
AggPro | 75 | 1.2412 | 0.1170 | 0.1415 |
Bayesball | 75 | 1.2794 | 0.1134 | 0.1431 |
y2013 | 75 | 1.1895 | 0.1097 | 0.1469 |
CAIRO | 75 | 1.2603 | 0.1203 | 0.1495 |
This time the CBS Sportsline projections wind up with the lowest errors, followed by the consensus and Zips. Marcel actually beats most systems in this test. As a percentage of the projected average, the errors are smaller for WHIP than ERA, which makes sense since WHIP stabilizes more quickly, but the spread in errors of WHIP between systems is much wider, so systems vary more in their projections of WHIP than ERA for this sample of pitchers.
Finally, here’s the table using a WHIP of 0.10 worse than the projected league average for missing players:
Source | MLB | WHIP | StdDev | MAE | RMSE | Missing |
---|---|---|---|---|---|---|
Actual | 672 | 1.2746 | 0.2381 | 0.0000 | 0.0000 | 0 |
Davenport | 672 | 1.3088 | 0.0951 | 0.1457 | 0.2253 | 198 |
Steamer/Razzball | 672 | 1.3155 | 0.0955 | 0.1472 | 0.2256 | 215 |
Steamer | 672 | 1.3148 | 0.1027 | 0.1480 | 0.2264 | 37 |
Fans | 672 | 1.2716 | 0.0829 | 0.1462 | 0.2281 | 465 |
AggPro | 672 | 1.3036 | 0.0693 | 0.1478 | 0.2286 | 557 |
ESPN | 672 | 1.3082 | 0.1080 | 0.1467 | 0.2290 | 336 |
Razzball | 672 | 1.2770 | 0.0846 | 0.1495 | 0.2304 | 524 |
Marcel | 672 | 1.2838 | 0.0888 | 0.1504 | 0.2311 | 136 |
Bayesball | 672 | 1.3262 | 0.0928 | 0.1537 | 0.2317 | 222 |
All Consensus | 672 | 1.3163 | 0.1202 | 0.1487 | 0.2318 | 4 |
Oliver | 672 | 1.3324 | 0.1364 | 0.1507 | 0.2330 | 31 |
Larson | 672 | 1.3506 | 0.0857 | 0.1520 | 0.2332 | 463 |
RotoValue | 672 | 1.2871 | 0.0968 | 0.1525 | 0.2335 | 32 |
RV Pre-Australia | 672 | 1.2891 | 0.0974 | 0.1523 | 0.2336 | 42 |
Zips | 672 | 1.3289 | 0.1501 | 0.1562 | 0.2379 | 69 |
CBS | 672 | 1.3189 | 0.0962 | 0.1574 | 0.2380 | 462 |
RotoGuru | 672 | 1.3139 | 0.1231 | 0.1576 | 0.2405 | 150 |
MORPS | 672 | 1.3036 | 0.1243 | 0.1583 | 0.2406 | 143 |
CAIRO | 672 | 1.3809 | 0.1890 | 0.1805 | 0.2593 | 94 |
RotoChamp | 672 | 1.2640 | 0.1981 | 0.1757 | 0.2892 | 249 |
y2013 | 672 | 1.2950 | 0.2538 | 0.2035 | 0.3252 | 169 |
In this test Clay Davenport’s model edges out Steamer/Razzball and Steamer for lowest RMSE, with Fangraphs Fans, AggPro, and ESPN close behind. Marcel beats more than half the models, as in the test of the 75 players projected by all, but now for the first time in these tests it also beats out the consensus, which is usually among the best performing systems. But here, the wider spread among systems in errors might work against a crowd-sourcing approach which usually does quite well with other stats.
Projections systems vary much more on WHIP than they do on ERA (or wOBA), but in general they all perform much better than 2013 data. Yet while for wOBA, most systems usually beat the benchmark of Tom Tango’s Marcel system, for these pitching stats, Marcel is still quite good. Projecting pitching is harder than projecting hitting. Fantasy veterans know that already, of course, but these numbers support that conclusion also.