Comparing 2014 Projections – wOBA

In the past three years I’ve done reviews of baseball projections systems with actual data for those systems for which I could get data. Will Larson maintains a valuable site of projections from many different sources, and most of the sources I’m comparing are from that.
As in the past, I’m computing root mean square error (RMSE) and mean absolute error (MAE) for each source compared to actual data. For these tests, I am doing a bias adjustment, so the errors are relative to the average of a source. I care more about how a system projects players relative to its own projected averages than about how well it projlects the overall league average.
I have data from these systems:

AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
Bayesball – Projections from Jonathan Adams.
CAIRO – from S B of the Replacement Level Yankees Weblog.
CBS Projections from CBS Sportsline.
Davenport Clay Davenport’s projections.
ESPN Projections from ESPN.
Fans Fans’ projections from Fangraphs.com.
Larson Will Larson’s projections.
Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
MORPS – A projection model by Tim Oberschlake.
Rosenheck Projections by Dan Rosenheck.
Oliver – Brian Cartwright’s projection model.
Steamer – Projections by Jared Cross, Dash Davidson, and Peter Rosenbloom.
Steamer/Razzball – Steamer rate projections, but playing time projections from Rudy Gamble of Razzball.com.
RotoValue – my current model, based largely on Marcel, but with adjustments for pitching decision stats and assuming no pitcher skill in BABIP.
RV Pre-Australia – The RotoValue projections taken just before the first Australia games last year. Before the rest of the regular season I continued to tweak projections slightly.
ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.

In addition, I’ve computed a source “All Consensus”, which is a simple average of each of the above (ignoring a source if it doesn’t project some particular category).
Not all the models had enough data to compute wOBA, so the tables (below the jump) only include those sources which do. The other sources do affect the All Consensus values for those stats where they do have data.
First, as an “apples-to-apples” comparison, I’m comparing only those players projected by each system (279 total):

Source	Num	Avg wOBA	MAE	RMSE
Actual	279	0.3270	0.0000	0.0000
All Consensus	279	0.3387	0.0236	0.0303
Steamer	279	0.3354	0.0240	0.0307
Steamer/Razzball	279	0.3364	0.0242	0.0308
Zips	279	0.3386	0.0245	0.0309
RotoValue	279	0.3353	0.0242	0.0313
RV Pre-Australia	279	0.3351	0.0241	0.0313
CAIRO	279	0.3361	0.0247	0.0315
Davenport	279	0.3343	0.0243	0.0316
MORPS	279	0.3432	0.0250	0.0318
Marcel	279	0.3200	0.0247	0.0318
Oliver	279	0.3394	0.0250	0.0325
CBS	279	0.3410	0.0253	0.0326
Fans	279	0.3470	0.0257	0.0329
y2013	279	0.3374	0.0314	0.0427

The lowest errors came from the Consensus, so there may be some marginal improvement from averaging multiple sources. But the spread among systems was rather small. Steamer did best among actual systems, but they all did markedly better than my simple benchmark of 2013 data. ZiPS was second-best, followed by the two RotoValue models (yay!). Marcel remains quite competitive here, though, which shows that a basic model can still do quite well.
Next I’m rerunning the analysis using 20 points worse than league average wOBA for any player not projected, and now comparing the 643 players projected by at least 1 system:

Source	MLB	wOBA	StdDev	MAE	RMSE	Missing
Actual	643	0.3186	0.0427	0.0000	0.0000	0
Steamer	643	0.3271	0.0260	0.0270	0.0364	21
Zips	643	0.3291	0.0274	0.0275	0.0365	49
Steamer/Razzball	643	0.3288	0.0251	0.0272	0.0366	166
All Consensus	643	0.3305	0.0250	0.0272	0.0366	2
Davenport	643	0.3264	0.0250	0.0276	0.0376	158
CAIRO	643	0.3267	0.0284	0.0283	0.0377	32
Oliver	643	0.3294	0.0315	0.0282	0.0379	25
MORPS	643	0.3353	0.0257	0.0285	0.0380	111
Fans	643	0.3403	0.0255	0.0285	0.0384	320
RV Pre-Australia	643	0.3285	0.0247	0.0285	0.0385	11
CBS	643	0.3339	0.0261	0.0287	0.0387	295
RotoValue	643	0.3288	0.0246	0.0286	0.0387	7
Marcel	643	0.3121	0.0234	0.0288	0.0387	104
y2013	643	0.3255	0.0474	0.0364	0.0503	137

The errors are a bit bigger, as this set includes more players, and those who will play less (and thus be less likely to perform close to their true talent). Steamer is again the best single system, this time edging out ZiPS slightly, and the Consensus now just behind Steamer/Razzball. Oliver, CBS, and Fangraphs Fans, which all lagged Marcel in the smaller set, now do better, as all systems now have lower errors than Tango’s monkey system. My model, however, dropped back relative to the other systems, which implies my projections for less strong players may be relatively weaker than other systems.
The spread between the best and worst system in RMSE is just 0.0023, even smaller than last year’s spread, while the gap from the weakest system to 2013 data is over 5 times as large. So using projections is better than simply relying on last year’s data. Steamer also came out on top in the comparison I did last year, but the spread between systems is smaller this time, so which projections you use matters far less than that you use projections.
Update: Rudy Gamble of Razzball.com asked if I could rerun the analysis for players with 500 or more PA. So here’s the table:

Source	MLB	wOBA	StdDev	MAE	RMSE	Missing
Actual	149	0.3352	0.0316	0.0000	0.0000	0
All Consensus	149	0.3422	0.0221	0.0222	0.0276	0
Steamer/Razzball	149	0.3400	0.0241	0.0226	0.0280	0
Steamer	149	0.3391	0.0237	0.0227	0.0281	0
Davenport	149	0.3382	0.0229	0.0229	0.0284	2
Zips	149	0.3428	0.0240	0.0233	0.0287	0
MORPS	149	0.3470	0.0242	0.0229	0.0288	1
RV Pre-Australia	149	0.3387	0.0239	0.0229	0.0292	0
CAIRO	149	0.3401	0.0252	0.0237	0.0294	1
RotoValue	149	0.3388	0.0239	0.0232	0.0294	0
Marcel	149	0.3225	0.0230	0.0233	0.0302	2
CBS	149	0.3444	0.0268	0.0241	0.0305	4
Fans	149	0.3516	0.0256	0.0241	0.0307	6
Oliver	149	0.3441	0.0283	0.0241	0.0313	0
y2013	149	0.3429	0.0411	0.0307	0.0425	3

This is very much like the apples-to-apples table above, as very few systems didn’t have a projection. This is a set of smaller, and better, players, and the overall errors are lower, but the ordering remains about the same.

2 comments

Josh says:

February 14, 2015 at 9:19 pm

It’s interesting that a simple equally weighted average of the various projections seems to do best, but isn’t that surprising, in a “view of the masses phenomenon. Do you have the ability to optimize the weights of the projections to create an optimal consensus? For example, maybe an optimal consensus uses 40% steamer and 10% of six other projections. Basically, solve for the optimal protection weights to get the lowest RMSE.
Dan M. says:

February 15, 2015 at 10:51 pm

Glad to see your numbers are pretty much in line with what I found when I did this a couple months ago. Hat tip to you for including so many projection systems, I know those merges are a pain! Great stuff!

Comments are closed.