Monster 2013 Projection Review

Update 7 February 2014: I’ve updated the post below to add an additional source, Rosenheck, and to correct a bug in my code generating the table for missing players. The original table incorrectly was computing RMSE and MAE of only the players projected by a system without including missing players at all, leaving a very much apples-to-oranges comparison: systems projecting fewer players tended to project only better players, who are more predictable, and hence those showed much lower RMSE. Now that I actually am including missing points, all systems do have higher errors than in the earlier table, and so the biggest apparent bias was from my own bug. Also, I’ve removed blank projections from BaseballGuru, which caused that to show higher errors in the missing player case, since blank “projections” are for a 0.00 ERA or a .000 batting average.
Tom Tango kindly highlighted my previous posts reviewing the 2013 forecast data that was available on the RotoValue site last year, and in the comments, he pointed me to a data set put online by Will Larson that has 2013 projection data from a dozen sources.
Dr. Larson kindly allows people to write articles using his data so long as they let him know and cite him as the source. I’ve downloaded projections from 12 sources from his site, http://www.bbprojectionproject.com and am including these additional sources in the comparisons for this article:

AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
Steamer – A system from Jared Cross, Dash Davidson, and Peter Rosenbloom.[ref]I’ve separately received Steamer/Razzball data from Jared Cross, using the same rate projections, but playing time projections from Razzball’s Rudy Gamble. I’ve left both sources in because they’re snapshots from different times, and they include different sets of players. But in theory, a test of just rate stats should find these the same.[/ref]
Will Larson – Will Larson has projections in his own name also.
Zips – A projection model from Dan Szymborski.
CBS Sportsline
ESPN
Razzball – Projections from Grey Albright of Razzball.com, distinct from the Steamer/Razzball collaboration I have evaluated previously.
RotoChamps
Fangraphs Fans
BaseballGURU
Sports Illustrated
Dan Rosenheck‘s projections.

In addition, Brian Cartwright, the designer of the Oliver projections, shared his 2013 data with me. Thanks, Brian!
Many of these sources have fewer data fields than the other sources I have, with some not providing the raw data from which to compute average, ERA, or WHIP. If a source didn’t provide AB or IP, I used the AB or IP from the consensus of the five sources I had last year. I then computed hits from average for batters, ER from ERA for pitchers, and hits/walks from WHIP (if a source had neither, I simply assumed 3 hits per walk, and derived both from WHIP).
I’m also including the same sources I had before:

CAIRO – from S B of the Replacement Level Yankees Weblog.
Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
MORPS – A projection model by Tim Oberschlake.
Steamer/Razzball – Rate projections by Jared Cross, Dash Davidson, and Peter Rosenbloom, and playing time projections from Rudy Gamble of Razzball.com.
RotoValue – my current model, based largely on Marcel, but with adjustments for pitching decision stats and assuming no pitcher skill in BABIP.

In addition, I’m keeping the same Consensus forecast I had previously (a simple average of the 5 sources above), and adding a new AllConsensus, that is a simple average of all 17 systems I’m evaluating here.
As before I’ll be showing two tables per statistic, one comparing only those players projected by all systems, and another assuming a rate value for players not projected by a system relative to that system’s average. Since there’s not enough data to compute wOBA for each source (or SLG, or OBP), I’m just going to do batting average on offense here, and I’ll subtract 0.020 from the source’s overall average for any missing players.
For pitching, I’ll be able to do ERA and WHIP for each source, adding 0.50 to average ERA and 0.10 to average WHIP for missing players.
I’m still computing MAE and RMSE for each source, with bias-adjusted errors. A source is graded by how well it projects players relative to its own average, not by how well its average wins up matching the actual average. For people using projections in fantasy analysis, this is what you’d want: the player’s ranking depends on their relative, not absolute, statistics.
So let’s get to the numbers!
Now there are only 220 players projected by all systems:

Source	Num	Avg Avg	MAE	RMSE
Actual	220	0.2654	0.0000	0.0000
AllConsensus	220	0.2688	0.0203	0.0257
Aggpro	220	0.2684	0.0202	0.0257
Oliver	220	0.2665	0.0202	0.0257
Steamer/Razzball	220	0.2670	0.0204	0.0258
Steamer	220	0.2670	0.0204	0.0258
Rosenheck	220	0.2698	0.0206	0.0259
FangraphsFans	220	0.2745	0.0205	0.0260
Consensus	220	0.2683	0.0206	0.0260
ZiPS	220	0.2643	0.0201	0.0260
RotoValue	220	0.2668	0.0207	0.0264
BaseballGuru	220	0.2571	0.0205	0.0264
CAIRO	220	0.2653	0.0211	0.0264
SI	220	0.2723	0.0208	0.0266
Marcel	220	0.2712	0.0210	0.0266
ESPN	220	0.2717	0.0213	0.0269
MORPS	220	0.2713	0.0216	0.0272
Razzball	220	0.2701	0.0214	0.0274
CBS	220	0.2733	0.0222	0.0281
RotoChamp	220	0.2781	0.0217	0.0283
Will Larson	220	0.2637	0.0218	0.0286
y2012	220	0.2706	0.0245	0.0315

Here AllConsensus, AggPro, and Oliver are in a virtual tie for the lowest errors, with Steamer next (it’s nice to see Steamer and Steamer/Razzball so close). Even the highest errors are still just under .003 more than the lowest, so all these systems do a good job of projecting the top batters. I’m glad to see my own model now performing a little better than the median!
Now let’s assume missing players hit 20 points worse than a system’s average:

Source	Num	Avg	MLB	Avg	StdDev	MAE	RMSE	Missing
Actual	633	0.2570	633	0.2570	0.0364	0.0000	0.0000	0
Steamer	608	0.2621	633	0.2609	0.0193	0.0231	0.0308	110
AllConsensus	1514	0.2392	633	0.2612	0.0205	0.0230	0.0309	15
Steamer/Razzball	504	0.2633	633	0.2608	0.0193	0.0232	0.0310	162
Oliver	1445	0.2388	633	0.2588	0.0211	0.0230	0.0310	23
Consensus	786	0.2612	633	0.2621	0.0192	0.0234	0.0314	80
FangraphsFans	331	0.2706	633	0.2671	0.0192	0.0234	0.0314	310
ZiPS	1000	0.2452	633	0.2575	0.0205	0.0235	0.0315	29
CAIRO	507	0.2606	633	0.2587	0.0207	0.0240	0.0316	177
Aggpro	302	0.2658	633	0.2608	0.0186	0.0238	0.0317	336
Marcel	750	0.2596	633	0.2640	0.0203	0.0241	0.0322	105
SI	352	0.2690	633	0.2654	0.0194	0.0239	0.0322	290
BaseballGuru	698	0.2476	633	0.2487	0.0218	0.0241	0.0324	109
Razzball	251	0.2697	633	0.2631	0.0198	0.0243	0.0324	390
MORPS	539	0.2633	633	0.2643	0.0206	0.0246	0.0326	163
Will Larson	294	0.2617	633	0.2561	0.0208	0.0249	0.0334	342
RotoChamp	434	0.2737	633	0.2716	0.0219	0.0250	0.0338	230
Rosenheck	414	0.2664	633	0.2626	0.0228	0.0243	0.0338	236
RotoValue	751	0.2600	633	0.2601	0.0236	0.0244	0.0354	105
CBS	751	0.2626	633	0.2619	0.0333	0.0273	0.0389	88
ESPN	569	0.2594	633	0.2596	0.0377	0.0266	0.0417	126
y2012	611	0.2592	633	0.2585	0.0414	0.0318	0.0456	129

This time the error spreads are much wider, ~~with Razzball, one of the weaker sources in the prior table, now leading the pack~~. As a general rule, sources that project fewer players rank higher here than in the prior table, and indeed the Steamer data highlights this: the Steamer file on Larson’s site projected more players who played in 2013 than the one Jared Cross sent me that I label Steamer/Razzball, and it’s the latter version that ranks higher in this test. AllConsensus, which tied for best before, now drops far back, well behind the smaller consensus which has projections for fewer players.
Having fixed my bug and updated my table, I now see Steamer with the lowest RMSE, closely followed by AllConsensus, Steamer/Razzball, and Oliver, but all the top systems are quite close in this test.
~~That’s interesting, perhaps suggesting that simply projecting 20 points worse than league average may be better than doing an individual projection for many weaker players~~. (I assume that players projected by all systems tend to be much better than those not projected by some, and indeed the cumulative batting averages of the different data sets support that assumption). While each systems’ errors rise compared to the subset of all projected players, some systems far much worse here than in the other test, which suggests that most systems can do reasonably well with the top hitters, but what separates a good from a mediocre system is how they do with weaker players.
Now on to pitching. First up, ERA:
84 players projected by all systems

Source	Num	Avg ERA	MAE	RMSE
Actual	84	3.6870	0.0000	0.0000
Steamer/Razzball	84	3.8968	0.5718	0.7862
Steamer	84	3.8961	0.5783	0.7879
Aggpro	84	3.9394	0.5568	0.7949
Rosenheck	84	3.8880	0.5585	0.8027
Will Larson	84	3.9940	0.5611	0.8108
AllConsensus	84	3.7522	0.5832	0.8286
Consensus	84	3.7651	0.6055	0.8352
FangraphsFans	84	3.5878	0.6015	0.8414
RotoValue	84	3.7182	0.6053	0.8492
Razzball	84	3.6381	0.6217	0.8572
ESPN	84	3.6788	0.6024	0.8710
CAIRO	84	3.7388	0.6269	0.8738
Marcel	84	3.6714	0.6388	0.8750
RotoChamp	84	3.6447	0.6366	0.8763
Oliver	84	3.5956	0.6343	0.8842
MORPS	84	3.7761	0.6654	0.9016
ZiPS	84	3.7967	0.6512	0.9159
SI	84	3.6444	0.6630	0.9319
CBS	84	3.5727	0.6663	0.9494
BaseballGuru	84	3.8959	0.6936	0.9719
y2012	84	3.6606	0.8278	1.0753

Steamer and Aggpro top this chart, followed by Dan Rosenheck. Will Larson, who ranked last in the parallel batting average table, now places fourth. But there are only 84 pitchers ranked by all systems, and in digging deeper, I found three sources from Larsen, his own projections, SI, and CBS, did not provide projections for top closers at all (or perhaps many relievers generally).
Now let’s see what happens comparing all players who actually played, but adding 0.50 to the league average ERA projection if you didn’t project a player:

Source	Num	ERA	MLB	ERA	StdDev	MAE	RMSE	Missing
Actual	664	3.8648	664	3.8636	1.3836	0.0000	0.0000	0
Steamer	635	3.9522	664	3.9916	0.5025	0.8835	1.3108	142
Steamer/Razzball	515	3.9583	664	4.0113	0.5030	0.8886	1.3120	198
Consensus	872	3.9740	664	3.9859	0.5066	0.9144	1.3389	113
AllConsensus	1310	4.6060	664	4.0786	0.5994	0.9109	1.3417	31
Aggpro	203	3.9761	664	4.2058	0.4474	0.9082	1.3422	469
FangraphsFans	232	3.6821	664	3.8801	0.4680	0.9150	1.3468	444
RotoChamp	477	3.9016	664	3.9377	0.5894	0.9366	1.3499	244
Razzball	151	3.6199	664	3.9057	0.3891	0.9165	1.3513	522
MORPS	499	3.8988	664	3.9432	0.6057	0.9326	1.3587	226
Marcel	841	3.9947	664	3.9466	0.5032	0.9440	1.3650	136
Oliver	1122	4.4497	664	3.9867	0.6314	0.9523	1.3719	82
CAIRO	489	4.0033	664	4.0743	0.6217	0.9470	1.3774	234
Will Larson	128	4.1284	664	4.4087	0.3672	0.9404	1.3803	538
ESPN	500	3.9256	664	4.0417	0.6351	0.9552	1.3919	233
ZiPS	998	4.7479	664	4.2787	0.7708	0.9533	1.3929	48
Rosenheck	172	4.0618	664	4.2862	0.3898	0.9584	1.4003	498
RotoValue	836	3.9983	664	3.9953	0.6849	0.9681	1.4044	133
SI	211	3.8986	664	4.1051	0.4609	0.9701	1.4087	469
CBS	308	3.9767	664	4.1821	1.1338	1.0573	1.7149	424
BaseballGuru	738	4.5729	664	4.5452	1.7816	1.1384	2.1694	152
y2012	630	3.9905	664	4.0920	1.8927	1.3148	2.1943	168

Now Razzball winds up on top, just ahead of Will Larson, with AggPro a little further back. Even more starkly than with batting average, though, I see that systems projecting fewer players tend to rank much higher in this comparison. Yet again Steamer/Razzball moves well ahead of Steamer, and AllConsensus, which had outperformed Consensus in the prior test, now falls much further behind. Steamer is still on top, followed by the aggregates, the 5 system Consensus, Aggpro, and the Consensus of all sources. Pitching is indeed harder to predict, as we see a much wider spread of errors for ERA than for Avg.
Finally, let’s take a look at WHIP, again first looking at only the 84 pitchers projected by all sources:

Source	Num	Avg WHIP	MAE	RMSE
Actual	84	1.2373	0.0000	0.0000
Rosenheck	84	1.2746	0.1002	0.1358
FangraphsFans	84	1.2257	0.1035	0.1364
Steamer	84	1.2741	0.1038	0.1374
Steamer/Razzball	84	1.2738	0.1045	0.1374
Aggpro	84	1.2933	0.1040	0.1386
RotoValue	84	1.2511	0.1029	0.1387
AllConsensus	84	1.2547	0.1074	0.1408
Consensus	84	1.2630	0.1084	0.1422
Will Larson	84	1.2609	0.1035	0.1434
Marcel	84	1.2366	0.1126	0.1442
ESPN	84	1.2492	0.1089	0.1448
MORPS	84	1.2677	0.1100	0.1452
RotoChamp	84	1.2298	0.1135	0.1466
Oliver	84	1.2550	0.1104	0.1504
Razzball	84	1.2304	0.1121	0.1510
BaseballGuru	84	1.2426	0.1169	0.1516
ZiPS	84	1.2603	0.1165	0.1530
SI	84	1.2471	0.1177	0.1536
CBS	84	1.2259	0.1187	0.1582
y2012	84	1.2165	0.1324	0.1665
CAIRO	84	1.2790	0.1355	0.1774

Here ~~Fangraphs’ Fans~~ Dan Rosenheck’s projections take the top spot, followed by Fangraphs Fans, Steamer, and then Aggpro and my own RotoValue projections (yay!) also bettering the two consensus forecasts.
When I fill in league average plus 0.10 for missing players, ~~we again find sources that projected few players tending to rise to the top~~ Rosenheck drops significantly, but the top systems remain the same:

Source	Num	WHIP	MLB	WHIP	StdDev	MAE	RMSE	Missing
Actual	664	1.2995	664	1.2993	0.2354	0.0000	0.0000	0
FangraphsFans	232	1.2488	664	1.2877	0.0868	0.1535	0.2174	444
Steamer/Razzball	515	1.3139	664	1.3250	0.0895	0.1522	0.2176	198
Steamer	635	1.3163	664	1.3239	0.0895	0.1518	0.2183	142
Aggpro	203	1.3124	664	1.3582	0.0804	0.1558	0.2207	469
AllConsensus	1310	1.4425	664	1.3365	0.1264	0.1581	0.2228	31
Marcel	841	1.3055	664	1.2952	0.0911	0.1589	0.2232	136
Consensus	872	1.3231	664	1.3280	0.1101	0.1579	0.2233	113
RotoValue	836	1.3136	664	1.3097	0.1100	0.1592	0.2246	133
Razzball	151	1.2292	664	1.2868	0.0804	0.1580	0.2252	522
Oliver	1122	1.4335	664	1.3435	0.1327	0.1643	0.2278	82
Will Larson	128	1.2967	664	1.3531	0.0895	0.1617	0.2307	538
MORPS	499	1.3129	664	1.3226	0.1335	0.1626	0.2307	226
Rosenheck	172	1.3099	664	1.3549	0.0808	0.1615	0.2310	498
RotoChamp	477	1.2978	664	1.3029	0.1378	0.1691	0.2312	244
SI	211	1.2976	664	1.3400	0.0923	0.1658	0.2320	469
ESPN	500	1.3046	664	1.3257	0.1321	0.1665	0.2323	233
BaseballGuru	738	1.2990	664	1.3151	0.1236	0.1697	0.2359	152
ZiPS	998	1.4692	664	1.3706	0.1587	0.1710	0.2360	48
CAIRO	489	1.3720	664	1.3874	0.1733	0.1864	0.2529	234
CBS	308	1.3011	664	1.3474	0.2360	0.1842	0.3102	424
y2012	630	1.3049	664	1.3219	0.3179	0.2175	0.3630	168

Once again Razzball and Will Larson rise to the top, the Steamer/Razzball ranks higher than Steamer, Consensus bests AllConsensus, and in general it seems sources that project fewer players tend to rank higher in this comparison.
I’m still trying to think about what the results mean, but it does seem that for weaker players, projections systems may not provide much value added compared to simply guessing something a little worse than league average. Or maybe projections that revert to an overall league average in a statistic pull the projected values too high systematically for weaker players. For now I want to present the data and find out what others think about this, but in terms of evaluating forecasters, I think the comparison of players projected by all systems is probably more informative than the other one (disclaimer: my own system does rank higher that way, but so do the consensus forecasts, and I do sincerely think missing quite a few players may be a relative advantage somehow).
As has been seen before, projections are more accurate at forecasting batting stats than pitching stats, and virtually every system out there does much better than simply using 2012 data. Comments and feedback welcome, in comments below, or you can e-mail me: geoff at rotovalue dot com.
While I did try my best to be accurate, it is possible that I made some mistakes in uploading the data, trying to match names in the files with players in my own database. I conclude by thanking Dr. Larson for compiling and making so much data available, Brian Cartwright, for sharing Oliver with me, and Tom Tango, for pointing me to Dr. Larson’s site and his feedback on the methodology.