Reviewing Pitching Projections

Last week I posted some statistical results comparing five 2012 projections systems’ statistics to actual 2012 numbers for wOBA, a good summary offensive rate statistic. Now I’d like to run a similar analysis, but using pitching data, and 3 pitching rate statistics.
I’ll be running numbers for a total of 7 systems:

CAIRO – from S B of the Replacement Level Yankees Weblog.
Marcel – the basic projections from Tom Tango, coauthor of The Book.
Steamer – developed by Jared Cross, Dash Davidson, and Peter Rosenbloom.
ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.
RotoValue – my own old projection algorithm.
NewRV – a new algorithm, based closely on Marcel, that I’m tinkering with.
Babip – a second new algorithm, where I force BABIP to be the same for all players.

I’ll be comparing each model in earned run average, strikouts/9 innings, and WHIP (walks plus hits per inning), statistics I suspect are of interest to fantasy league players, and I’m also including actual 2012 data, as well as unadjusted 2011 data as a baseline “projection”. As before, I’m computing both mean absolute error and root mean square error for the rate statistics. All these averages are weighted by innings pitched, showing data only for those players projected by all systems.

First earned run average:

Source	Num	Avg ERA	MAE	RMSE
Actual	358	3.9565	0.0000	0.0000
Steamer	358	4.0181	0.7989	1.1065
CAIRO	358	3.9118	0.7993	1.1544
Marcel	358	3.7616	0.8173	1.1565
Babip	358	3.8914	0.8333	1.1577
ZiPS	358	4.0269	0.8227	1.1655
NewRV	358	3.8923	0.8463	1.1974
RotoValue	358	3.7452	0.9471	1.3734
y2011	358	3.6955	1.0364	1.5152

Steamer had the lowest RMSE by far, and just barely edged CAIRO in MAE.
Next WHIP:

Source	Num	Avg WHIP	MAE	RMSE
Actual	358	1.2922	0.0000	0.0000
Steamer	358	1.3640	0.1335	0.1853
Babip	358	1.3027	0.1395	0.1945
Marcel	358	1.2802	0.1374	0.1961
ZiPS	358	1.3313	0.1388	0.1988
CAIRO	358	1.3167	0.1420	0.2055
NewRV	358	1.2993	0.1474	0.2088
RotoValue	358	1.2826	0.1670	0.2340
y2011	358	1.2752	0.1819	0.2571

Again Steamer has the lowest RMSE and MAE, while the Babip-adjusted new model I’m testing had the second lowest RMSE, and Tom Tango’s Marcel had the second-lowest MAE. This is a good time to make the point that in computing errors, I do adjust for the difference in average from the projection and the actual MLB average. There are two reasons for this: the actual value of the league’s composite stats can vary due to external factors, so I don’t want to penalize a system simply for missing the league average; second, from a fantasy perspective what matters isn’t the player’s raw statistics themselves, but how those statistics compare to other players. A 3.80 ERA is outstanding when the league average is 4.50, but it’s not so good when the league’s ERA is 3.50. So long as the projection tells me what players will be good and what ones won’t, I don’t care so much about matching actual stats.
One of the developers of Steamer, Jared Cross, had e-mailed me to say that they found a bug in their WHIP computations, causing it to be too high across the board. In this error analysis, however, there is no penalty for being off on the league averages, and indeed Steamer still performed the best here in 2012.
Finally, K/9 IP:

Source	Num	Avg K/9	MAE	RMSE
Actual	358	7.5941	0.0000	0.0000
Steamer	358	7.2041	0.9693	1.2879
Babip	358	7.2999	1.0047	1.3401
NewRV	358	7.2999	1.0047	1.3401
ZiPS	358	7.2601	1.0225	1.3518
CAIRO	358	7.2463	1.0080	1.3589
Marcel	358	7.3254	1.0452	1.3916
RotoValue	358	7.2044	1.1301	1.5325
y2011	358	7.2778	1.1463	1.5671

Again Steamer had the lowest RMSE and MAE of all the systems I tested. My new method(s) came in 2nd/3rd with the exact same data, which is by design. Essentially my Babip model is the same as the NewRV model, just with one additional adjustment for hits, so that each pitcher’s BABIP matches last year’s league average BABIP. Changing the hits does have a small impact on runs, and thus ERA, but both systems project exactly the same strikeout values.
To summarize across all the stats I tested, Steamer had the lowest errors in all three statistics, but the other systems (aside from my old RotoValue model) all had different relative strengths. CAIRO, Marcel, and my new Babip-adjusted model did well in ERA. My Babip model, Marcel, and ZiPS did well in WHIP. And my new models, ZiPS, and CAIRO did well in K/9. Comparing new models I’m back-testing with projections made before the 2012 season is not entirely fair to the projections, but it is certainly helpful as a benchmark.
When I compared my new models for batting statistics, I found assuming BABIP was the same for all batters made the projections worse. But making that assumption for pitchers looks like it improves accuracy.
The old RotoValue model is again clearly inferior at projecting rate statistics than anything else I tested. It’s still better than just using last year’s data, but it’s not competitive with other systems. And so for 2013, RotoValue will be using a new method for its own baseball statistics projections. I’d also be happy to include projection data from any sources that want to share it.
This analysis is still simply looking at rate statistics, only using playing time to weight the players in the averaging (a system gets more credit for being accurate for a 200 inning starter than being accurate for a 40 inning reliever). But it’s only weighing by actual 2012 innings, so systems get little or no credit (or blame) for how well they project actual playing time. For fantasy purposes, getting playing time is indeed important. That’s a topic for a different post.

1 comment