Matt Swartz tested several statistics projections against actual 2011 numbers here. He follows the guidelines Tom Tango outlined in this post, computing weighted on base average (wOBA) for each player and then comparing the actual wOBA to the projection. From the errors on individual players he computed both mean absolute error and root mean square error.

One of Matt’s assumptions was to run the calculations only for players with 200 or more plate appearances in 2011. In this post I’ll show results from a similar analyses without that cutoff, but compute the same MAE and RMSE values for the systems that I have data.

Not all projection systems made estimates for every player who actually played. I considered three different ways of addressing this:

- Compare only those players which have projections from all systems. This way you’re comparing the exact same players from each system, but you’re limited to fewer total players in the averages.
- Compare only the players projected by a given system. This uses all the projection data, but a system which projects more players, particularly young ones with little or no MLB experience, would fair poorer in this comparison relative to a system which projects fewer, more established players.
- Fill in data for “missing” players if a system hasn’t projected a given player. This has the advantage of being able to test all players, but it implies that for some systems, you use an arbitrary number for any players with no data. I’m using the cumulative forecast wOBA of all projected players.

In his analysis, Matt only followed the third method. I’ll be reporting data for all three.

I currently have data for five different projection systems:

- Marcel – This is Tom Tango’s system, “that uses as little intelligence as possible.”
- Oliver – The Hardball Times’s system, developed by Brian Cartwright.
- PECOTA – Baseball Prospectus’s system, now maintained by Colin Wyers.
- Steamer – A system developed by Jared Cross, Dash Davidson, and Peter Rosenbloom
- RotoValue – A basic projection model I’ve developed and applied to both MLB and NBA basketball statistics, which I described previously here.

In addition, I’m using 2010 actual MLB data as a projection for 2011. This is a simple control: if your projections are less accurate than last year’s numbers, you’re not adding value.

First, the results only averaging players projected by all systems:

Source | Num | Avg wOBA | MAE | RMSE |
---|---|---|---|---|

Actual | 404 | 0.3263 | 0.0000 | 0.0000 |

Marcel | 404 | 0.3333 | 0.0272 | 0.0354 |

Oliver | 404 | 0.3315 | 0.0261 | 0.0345 |

PECOTA | 404 | 0.3292 | 0.0265 | 0.0346 |

RotoValue | 404 | 0.3225 | 0.0323 | 0.0435 |

Steamer | 404 | 0.3351 | 0.0265 | 0.0349 |

2010 | 404 | 0.3311 | 0.0356 | 0.0478 |

For this set, Oliver edged out PECOTA and Steamer for the lowest MAE and RMSE, with Marcel next, RotoValue much further back, and 2010 the worst of all.

Next, I averaged all players in the system, computing errors for those who actually played in MLB in 2011:

Source | Num | wOBA | MLB | wOBA | StdDev | MAE | RMSE |
---|---|---|---|---|---|---|---|

Actual | 640 | 0.3224 | 640 | 0.3224 | 0.0446 | 0.0000 | 0.0000 |

Marcel | 828 | 0.3246 | 525 | 0.3319 | 0.0235 | 0.0278 | 0.0367 |

Oliver | 2090 | 0.2895 | 639 | 0.3272 | 0.0309 | 0.0270 | 0.0361 |

PECOTA | 943 | 0.3089 | 601 | 0.3260 | 0.0277 | 0.0271 | 0.0359 |

RotoValue | 504 | 0.3213 | 420 | 0.3220 | 0.0412 | 0.0324 | 0.0436 |

Steamer | 619 | 0.3332 | 534 | 0.3325 | 0.0291 | 0.0269 | 0.0358 |

2010 | 616 | 0.3270 | 491 | 0.3260 | 0.0505 | 0.0384 | 0.0529 |

Num refers to the total number of projections of any player by the system, and MLB is the number of projected players who played in 2011. The first wOBA column is the cumulative wOBA of all players in the sample, and the second is the average (weighted by plate 2011 plate appearances) of those who played in 2011. So systems like Oliver and PECOTA, which projected many more players than actually played in 2011, see their overall wOBA pulled down, but when restricted to those who actually played in 2011, their averages are now in line with the other systems.

In this analysis, Steamer edged out Oliver and PECOTA for the lowest MAE and RMSE, with Marcel close behind, RotoValue further back, and again 2010 much further back.

Now for the third and final table, this time filling in unforecast players with the average forecast wOBA:

Source | Num | wOBA | MLB | wOBA | StdDev | MAE | RMSE |
---|---|---|---|---|---|---|---|

Actual | 640 | 0.3224 | 640 | 0.3224 | 0.0446 | 0.0000 | 0.0000 |

Marcel | 828 | 0.3246 | 640 | 0.3315 | 0.0229 | 0.0285 | 0.0377 |

Oliver | 2090 | 0.2895 | 640 | 0.3272 | 0.0309 | 0.0270 | 0.0361 |

PECOTA | 943 | 0.3089 | 640 | 0.3258 | 0.0276 | 0.0274 | 0.0365 |

RotoValue | 504 | 0.3213 | 640 | 0.3219 | 0.0380 | 0.0330 | 0.0447 |

Steamer | 619 | 0.3332 | 640 | 0.3325 | 0.0284 | 0.0277 | 0.0371 |

2010 | 616 | 0.3270 | 640 | 0.3261 | 0.0484 | 0.0388 | 0.0531 |

Oliver has the lowest RMSE and MAE this time, closely followed by PECOTA, Steamer, and Marcel in that order. Once again RotoValue is further back, with 2010 data even further behind.

I hesitate to crown a “champion” as the best projection system, as the differences between the top 4 quite small, enough so that the particular assumptions you use may change the ordering among them. I can confidently say that my own system is not as good as the others, but is still a significant improvement over using 2010 data. It would be interesting to see how systems do over several years.

This analysis focused on projecting a rate statistic, measuring offensive performance. So systems get no credit for predicting playing time well, although they do get more credit for being closer on players who play a lot than on those who hardly play. Perhaps I’ll try a different method, which should take into account playing time as well.

Please offer suggestions and feedback in the comments, or you can e-mail me at geoff at rotovalue dot com. If I can get data for other projection systems, I’ll rerun the analysis with them included. And if you want to do your own analysis of the data I have, I’ve posted a comma-delimited text file here.

One other difference between the analysis I did and the one Matt did: when Matt had missing players, he used wOBA – 0.020 for systems other than Marcel (which had defined wOBA as the missing value). And Matt defined his missing value in terms of the average of players who played in 2011, weighted by actual PA.

I used the forecast average wOBA across all forecast players, weighted by forecast PA. For systems with very many players (and thus a lower average wOBA) like PECOTA and Oliver, this meant using a very low “missing” value, but those systems had very few missing players. The errors for systems like mine and Steamer, which had relatively many missing players, would be a trifle lower with Matt’s method, but the differences really are tiny.

Tom Tango asked to use a different computation for missing players. Rather than use the cumulative wOBA of all forecast players (which for some systems is much lower, due to projecting players not at all expected to play in the majors), he suggested using wOBA weighted by actual plate appearances, and then subtracting 0.020 for systems other than Marcel. I coded that up and tested it. This makes very little difference:

Oliver is unchanged, and PECOTA improves just an iota. RotoValue, Steamer, and the 2010 season improve a bit more, but still not much at all. And interestingly, Marcel gets ever so slightly worse.