Comparing 2014 Projections – ERA and WHIP

Yesterday I ran comparisons of several projections systems for an all-inclusive batting statistic, wOBA. Today I’m running the same tests, computing root mean square error (RMSE) and mean absolute error (MAE), for two commonly used fantasy statistics, ERA and WHIP. These tests are bias-adjusted, so what matters is a player’s ERA or WHIP relative to the overall average of that system, compared with the player’s actual statistic relative to the actual overall average. The lower the RMSE or MAE, the better a projection system predicted the actual data.
I have data for these projection models:

  • AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
  • Bayesball – Projections from Jonathan Adams.
  • CAIRO – from S B of the Replacement Level Yankees Weblog.
  • CBS Projections from CBS Sportsline.
  • Davenport Clay Davenport’s projections.
  • ESPN Projections from ESPN.
  • Fans Fans’ projections from Fangraphs.com.
  • Larson Will Larson’s projections.
  • Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
  • MORPS – A projection model by Tim Oberschlake.
  • Rosenheck Projections by Dan Rosenheck.
  • Oliver – Brian Cartwright’s projection model.
  • Steamer – Projections by Jared Cross, Dash Davidson, and Peter Rosenbloom.
  • Steamer/Razzball – Steamer rate projections, but playing time projections from Rudy Gamble of Razzball.com.
  • RotoValue – my current model, based largely on Marcel, but with adjustments forpitching decision stats and assuming no pitcher skill in BABIP.
  • RV Pre-Australia – The RotoValue projections taken just before the first Australia games last year. Before the rest of the regular season I continued to tweak projections slightly.
  • ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.

First up is ERA, comparing the 75 pitchers projected by all systems:

Source Num Avg ERA MAE RMSE
Actual 75 3.4261 0.0000 0.0000
Larson 75 3.7608 0.5834 0.7644
All Consensus 75 3.6435 0.5780 0.7693
CBS 75 3.5056 0.5723 0.7700
Steamer 75 3.8116 0.6015 0.7781
Zips 75 3.5525 0.5873 0.7868
ESPN 75 3.7014 0.5878 0.7882
RotoChamp 75 3.5329 0.5981 0.7886
Steamer/Razzball 75 3.8057 0.6164 0.7905
RotoGuru 75 3.8089 0.5713 0.8011
RV Pre-Australia 75 3.6518 0.6038 0.8037
Fans 75 3.4440 0.6275 0.8047
RotoValue 75 3.6512 0.6052 0.8049
Marcel 75 3.5315 0.6179 0.8066
Oliver 75 3.5351 0.6159 0.8222
Razzball 75 3.4998 0.6469 0.8231
Davenport 75 3.5964 0.6320 0.8270
CAIRO 75 3.6321 0.6367 0.8300
AggPro 75 3.7284 0.6879 0.8411
MORPS 75 3.6513 0.6742 0.8525
Bayesball 75 3.9120 0.7358 0.8876
y2013 75 3.3790 0.7074 0.9554

Not surprisingly the errors, even as a percentage of the average, are much higher here than for wOBA, as pitching performance is more volatile than batting performance. Will Larson’s projections did best here, followed by the Consensus, CBS Sportsline, and Steamer. All the models handily beat using 2013 data, albeit not as decisively as with wOBA, but but seven lagged behind Tango’s Marcel system. The other notable thing to me is that the average ERA for these players of every system is higher than the actual. Indeed 2013 actual data was slightly lower, so this shows how dependent systems are on older historical data. Aggregate ERA has fallen sharply since 2012, and the projection models are still reflecting that higher run environment to a large degree.
When you’re preparing for a fantasy auction, you care much more about how a system rates players relative to each other than how it rates them compared to the actual data. For these 75 pitchers, Steamer projected an aggregate 3.81 ERA, while they actually had a 3.43 ERA (weighted by actual 2014 innings pitched). But Steamer still ranks lower in errors than Fangraphs Fans’ projections, which came closest to the actual average of 3.44. This is the impact of bias adjustment: while the fans did better at projecting the actual league average ERA, Steamer tended to be closer on more individual players when adjusted for its league average.
To underscore that point, here’s the same ERA table as above, but this time doing raw errors, i.e. not doing any bias adjustment at all:

Source Num Avg ERA MAE RMSE
Actual 75 3.4261 0.0000 0.0000
CBS 75 3.5056 0.5900 0.7739
Zips 75 3.5525 0.6152 0.7820
Fans 75 3.4440 0.5901 0.7823
Razzball 75 3.4998 0.5887 0.7834
RotoChamp 75 3.5329 0.6043 0.7923
All Consensus 75 3.6435 0.6250 0.7993
Marcel 75 3.5315 0.6184 0.8069
Oliver 75 3.5351 0.6375 0.8271
AggPro 75 3.7284 0.6699 0.8273
ESPN 75 3.7014 0.6477 0.8278
RV Pre-Australia 75 3.6518 0.6385 0.8314
RotoValue 75 3.6512 0.6384 0.8317
Larson 75 3.7608 0.6769 0.8336
Davenport 75 3.5964 0.6460 0.8394
CAIRO 75 3.6321 0.6812 0.8546
Steamer/Razzball 75 3.8057 0.7031 0.8577
Steamer 75 3.8116 0.7054 0.8592
MORPS 75 3.6513 0.6995 0.8711
RotoGuru 75 3.8089 0.7216 0.8758
y2013 75 3.3790 0.7054 0.9537
Bayesball 75 3.9120 0.8315 0.9704

Now Fans ranks well above Steamer, which is near the bottom in this test. The lower errors here tend to come from systems that come closer to matching the actual league ERA. Even the naive 2013 data as a forecast now is no longer dead last by a wide margin, while CBS, Zips, and Fans are the best performing systems here. Comparing these two tables shows the impact of bias adjustment. What matters most for fantasy valuation is how players compare relative to each other, and not how well some system predicts the actual run environment players are in. So the first table is a better comparison of projections for fantasy purposes.
Both these tables are “apples to apples”, comparing only those players that each system projected. And it’s a small set of overall better than average pitchers, a group which overall is easier to project than a deeper set of MLB pitchers.
But of course if you’re in a fantasy league, it doesn’t help when a system doesn’t project someone you might care about. So this next table will use an ERA of 0.50 worse than the system’s league average for anybody not projected, and compare against a set of almost 700 pitchers:

Source MLB ERA StdDev MAE RMSE Missing
Actual 672 3.7395 1.4707 0.0000 0.0000 0
Steamer/Razzball 672 4.0077 0.5245 0.8759 1.4168 215
All Consensus 672 3.9776 0.5452 0.8773 1.4251 4
Davenport 672 3.8822 0.4985 0.8856 1.4270 198
Steamer 672 4.0087 0.5731 0.8883 1.4280 37
Oliver 672 3.9382 0.6424 0.9090 1.4418 31
Bayesball 672 4.0215 0.5219 0.9252 1.4427 222
Fans 672 3.7886 0.4485 0.9017 1.4473 465
AggPro 672 4.0252 0.3698 0.9071 1.4483 557
Razzball 672 3.8558 0.3791 0.9045 1.4485 524
RotoValue 672 3.8888 0.4810 0.9093 1.4488 32
RV Pre-Australia 672 3.9020 0.4852 0.9080 1.4505 42
Larson 672 4.2586 0.4348 0.9232 1.4540 463
ESPN 672 4.0331 0.5404 0.9056 1.4563 336
Marcel 672 3.7990 0.4901 0.9103 1.4587 136
Zips 672 4.0703 0.7450 0.9207 1.4640 69
CAIRO 672 4.0489 0.6991 0.9369 1.4758 94
CBS 672 4.0622 0.4768 0.9404 1.4799 462
MORPS 672 3.8850 0.6547 0.9492 1.4907 143
RotoGuru 672 4.3407 0.6741 0.9486 1.5056 150
RotoChamp 672 3.8623 0.6814 0.9398 1.5199 249
y2013 672 3.8355 1.7932 1.2610 2.2525 169

Here systems get more credit for projecting more players, so long as those projections are better than the default 0.50 worse than average. This shakes up the order quite a bit. Now Steamer/Razzball does best, followed by the Consensus, Clay Davenport, and Steamer. Will Larson, the winner of the test of fewer (and overall better) players drops to the middle of the pack, while CBS Sportsline and Zips now slip behind Marcel. Bayesball, which ranked worst in the earlier test, improves markedly. The overall errors increase quite a bit, as we’re now comparing projections for a much deeper set of players, many of whom have much less of a track record.
Finally, let’s take a look at WHIP. First, the “apples-to-apples” table comparing only the 75 pitchers projected by all systems:

Source Num Avg WHIP MAE RMSE
Actual 75 1.2036 0.0000 0.0000
CBS 75 1.2077 0.0874 0.1175
All Consensus 75 1.2360 0.0899 0.1205
Zips 75 1.2214 0.0943 0.1229
ESPN 75 1.2397 0.0963 0.1231
Larson 75 1.2562 0.0954 0.1239
Steamer 75 1.2616 0.0963 0.1246
Marcel 75 1.2238 0.0970 0.1254
Steamer/Razzball 75 1.2608 0.0996 0.1271
Fans 75 1.2030 0.0993 0.1271
RotoGuru 75 1.2364 0.0967 0.1272
Davenport 75 1.2487 0.0981 0.1284
RV Pre-Australia 75 1.2276 0.0977 0.1286
RotoValue 75 1.2275 0.0979 0.1288
Oliver 75 1.2456 0.0978 0.1303
RotoChamp 75 1.2090 0.1069 0.1335
MORPS 75 1.2464 0.1047 0.1336
Razzball 75 1.2012 0.1108 0.1388
AggPro 75 1.2412 0.1170 0.1415
Bayesball 75 1.2794 0.1134 0.1431
y2013 75 1.1895 0.1097 0.1469
CAIRO 75 1.2603 0.1203 0.1495

This time the CBS Sportsline projections wind up with the lowest errors, followed by the consensus and Zips. Marcel actually beats most systems in this test. As a percentage of the projected average, the errors are smaller for WHIP than ERA, which makes sense since WHIP stabilizes more quickly, but the spread in errors of WHIP between systems is much wider, so systems vary more in their projections of WHIP than ERA for this sample of pitchers.
Finally, here’s the table using a WHIP of 0.10 worse than the projected league average for missing players:

Source MLB WHIP StdDev MAE RMSE Missing
Actual 672 1.2746 0.2381 0.0000 0.0000 0
Davenport 672 1.3088 0.0951 0.1457 0.2253 198
Steamer/Razzball 672 1.3155 0.0955 0.1472 0.2256 215
Steamer 672 1.3148 0.1027 0.1480 0.2264 37
Fans 672 1.2716 0.0829 0.1462 0.2281 465
AggPro 672 1.3036 0.0693 0.1478 0.2286 557
ESPN 672 1.3082 0.1080 0.1467 0.2290 336
Razzball 672 1.2770 0.0846 0.1495 0.2304 524
Marcel 672 1.2838 0.0888 0.1504 0.2311 136
Bayesball 672 1.3262 0.0928 0.1537 0.2317 222
All Consensus 672 1.3163 0.1202 0.1487 0.2318 4
Oliver 672 1.3324 0.1364 0.1507 0.2330 31
Larson 672 1.3506 0.0857 0.1520 0.2332 463
RotoValue 672 1.2871 0.0968 0.1525 0.2335 32
RV Pre-Australia 672 1.2891 0.0974 0.1523 0.2336 42
Zips 672 1.3289 0.1501 0.1562 0.2379 69
CBS 672 1.3189 0.0962 0.1574 0.2380 462
RotoGuru 672 1.3139 0.1231 0.1576 0.2405 150
MORPS 672 1.3036 0.1243 0.1583 0.2406 143
CAIRO 672 1.3809 0.1890 0.1805 0.2593 94
RotoChamp 672 1.2640 0.1981 0.1757 0.2892 249
y2013 672 1.2950 0.2538 0.2035 0.3252 169

In this test Clay Davenport’s model edges out Steamer/Razzball and Steamer for lowest RMSE, with Fangraphs Fans, AggPro, and ESPN close behind. Marcel beats more than half the models, as in the test of the 75 players projected by all, but now for the first time in these tests it also beats out the consensus, which is usually among the best performing systems. But here, the wider spread among systems in errors might work against a crowd-sourcing approach which usually does quite well with other stats.
Projections systems vary much more on WHIP than they do on ERA (or wOBA), but in general they all perform much better than 2013 data. Yet while for wOBA, most systems usually beat the benchmark of Tom Tango’s Marcel system, for these pitching stats, Marcel is still quite good. Projecting pitching is harder than projecting hitting. Fantasy veterans know that already, of course, but these numbers support that conclusion also.