Last week I posted some statistical results comparing five 2012 projections systems’ statistics to actual 2012 numbers for wOBA, a good summary offensive rate statistic. Now I’d like to run a similar analysis, but using pitching data, and 3 pitching rate statistics.
I’ll be running numbers for a total of 7 systems:
- CAIRO – from S B of the Replacement Level Yankees Weblog.
- Marcel – the basic projections from Tom Tango, coauthor of The Book.
- Steamer – developed by Jared Cross, Dash Davidson, and Peter Rosenbloom.
- ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.
- RotoValue – my own old projection algorithm.
- NewRV – a new algorithm, based closely on Marcel, that I’m tinkering with.
- Babip – a second new algorithm, where I force BABIP to be the same for all players.
I’ll be comparing each model in earned run average, strikouts/9 innings, and WHIP (walks plus hits per inning), statistics I suspect are of interest to fantasy league players, and I’m also including actual 2012 data, as well as unadjusted 2011 data as a baseline “projection”. As before, I’m computing both mean absolute error and root mean square error for the rate statistics. All these averages are weighted by innings pitched, showing data only for those players projected by all systems.
First earned run average:
Steamer had the lowest RMSE by far, and just barely edged CAIRO in MAE.
Again Steamer has the lowest RMSE and MAE, while the Babip-adjusted new model I’m testing had the second lowest RMSE, and Tom Tango’s Marcel had the second-lowest MAE. This is a good time to make the point that in computing errors, I do adjust for the difference in average from the projection and the actual MLB average. There are two reasons for this: the actual value of the league’s composite stats can vary due to external factors, so I don’t want to penalize a system simply for missing the league average; second, from a fantasy perspective what matters isn’t the player’s raw statistics themselves, but how those statistics compare to other players. A 3.80 ERA is outstanding when the league average is 4.50, but it’s not so good when the league’s ERA is 3.50. So long as the projection tells me what players will be good and what ones won’t, I don’t care so much about matching actual stats.
One of the developers of Steamer, Jared Cross, had e-mailed me to say that they found a bug in their WHIP computations, causing it to be too high across the board. In this error analysis, however, there is no penalty for being off on the league averages, and indeed Steamer still performed the best here in 2012.
Finally, K/9 IP:
Again Steamer had the lowest RMSE and MAE of all the systems I tested. My new method(s) came in 2nd/3rd with the exact same data, which is by design. Essentially my Babip model is the same as the NewRV model, just with one additional adjustment for hits, so that each pitcher’s BABIP matches last year’s league average BABIP. Changing the hits does have a small impact on runs, and thus ERA, but both systems project exactly the same strikeout values.
To summarize across all the stats I tested, Steamer had the lowest errors in all three statistics, but the other systems (aside from my old RotoValue model) all had different relative strengths. CAIRO, Marcel, and my new Babip-adjusted model did well in ERA. My Babip model, Marcel, and ZiPS did well in WHIP. And my new models, ZiPS, and CAIRO did well in K/9. Comparing new models I’m back-testing with projections made before the 2012 season is not entirely fair to the projections, but it is certainly helpful as a benchmark.
When I compared my new models for batting statistics, I found assuming BABIP was the same for all batters made the projections worse. But making that assumption for pitchers looks like it improves accuracy.
The old RotoValue model is again clearly inferior at projecting rate statistics than anything else I tested. It’s still better than just using last year’s data, but it’s not competitive with other systems. And so for 2013, RotoValue will be using a new method for its own baseball statistics projections. I’d also be happy to include projection data from any sources that want to share it.
This analysis is still simply looking at rate statistics, only using playing time to weight the players in the averaging (a system gets more credit for being accurate for a 200 inning starter than being accurate for a 40 inning reliever). But it’s only weighing by actual 2012 innings, so systems get little or no credit (or blame) for how well they project actual playing time. For fantasy purposes, getting playing time is indeed important. That’s a topic for a different post.