Monkeying Around With Projections

Tom Tango has tossed out the Marcel “Monkey” system to project baseball statistics, and he also has in recent years hosted a forecasters’ challenge, which compares projections  from professional forecasters with each other as well as with Marcel. Tango describes Marcel as “the most basic forecasting system you can have, that uses as little intelligence as possible.” Tango’s point isn’t that Marcel should be particularly good; if anything, he seems bemused at how often holds its own against other systems. The winner of the official competition of the was Ask Rotoman. Tango also ran 3 other competitions, using the same projections but  different rules for ranking them. The other three were won by RotoWorld, KFFL, and “Consensus”, an averaging of all the forecasts entered. Congratulations to all the winners!
This year  I was fortunate enough to be in the competition, using a model I’d previously only used privately. I don’t claim to have particular talent, but I was curious how my home-brewed model would stack up against other forecasters. And while I’d love to say that I’d bested the Monkey, this year, my system finished behind Marcel in all four competitions Tango ran, and was thus among the weakest systems for 2011.
I was not alone in that: of the 22 systems he tested, five finished worse than Marcel in all four contests, and while there’s a small chance that’s simply bad luck, it’s more likely that my methodology simply isn’t as good as Marcel’s.
My current model implementation does the following to compute “raw” projections:

  1. I project a “rate” of the given statistic per “context” (IP for pitchers, and either AB or “outs” for batters).
  2. Each player has some inherent “skill” for a given statistic, and his rate performance in any given season is a function of his age and his implied skill.
  3. To determine the aging curve for a statistic, I use data from all the players in my database above a certain context in the season, weighting each season’s rate by their context, and compute a regression curve to try to fit that.
  4. Once I have coefficients for the aging curve for a statistic, I then do a second regression of the player’s historical performance in that statistic to infer the player’s skill level.
  5. Using the estimated skill level and the overall statistic’s coefficients, I then compute the player’s future rate in the statistic at projected by his next year’s age.
  6. I project context (playing time), and multiply the rate by that.

Once I have raw projections, I then “normalize” them. I sum projected AB and IP across each team, and when those numbers are too high, I reduce playing time for weaker players until they come within range; if they’re too low, I increase playing time for better players (subject to reasonable individual limits, so I don’t wind up projecting a batter to get 700 AB). My “normalization” step also had a manual fudge factor based on injuries known before the start of the season. For example, my raw projections expected a very nice year from Adam Wainwright, but since it was known he would be out for the season, I essentially zeroed out his stats in this step.
Marcel is a simpler starting point: it uses a weighted average of the most recent three years of data as its starting point, where my model tries to first compute an aging curve and then best fit that curve to a player’s past data. Marcel also considers age, but perhaps the biggest difference is that Marcel assumes regression to the league’s mean performance with a fixed number of plate appearances. The net effect is that players with less history are pulled much more strongly to league average, to the extreme that a player with no history at all at the MLB level is assumed to to have league average performance (albeit in a more limited context).
And that regression is, I think, a major reason why its projections are better. My model is quite bad on rookies, because either I have no historical data and thus don’t make a projection at all, or else I only extrapolate from what little data I have, putting way to much importance on a limited sample size.
There are other theoretical weaknesses of my model compared to other systems: I don’t use major league equivalencies of minor league data; I don’t take park effects or league changes into consideration; I don’t look at similar players at comparable ages to see how a player might progress; and I’m only using a statistic to project itself, even if other statistics correlate better with it (this is most egregious with wins and especially saves, which are heavily dependent on factors outside the player’s control). Yet Marcel does quite well when compared to other forecasters while also not taking into account such adjustments.
Another difference is that Marcel uses a crowd-sourced playing time forecast, where my model simply takes its raw projections and then adjusts only if/when a team total is out of range. This also was a weakness for me, as in mid-season data he released I found that my model had greatly undervalued Jacoby Ellsbury relative to other systems, basically because it projected Ellsbury for very little playing time (no doubt overreacting to his recent injured past).
So two things that might improve my model for the future would be adding mean reversion, and using an outside source for playing time estimates, rather than relying on my internal algorithm.
The most consistently strong projections across all four competitions was the consensus, a vote of confidence for crowd-sourcing. I hope to improve my model and do better in next year’s challenge.

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *