Reviewing the FiveThirtyEight 2018 Election Forecasts

Nate Silver’s fivethirtyeight.com published forecasts of the 2018 mid-term elections, predicting the outcome of races for House, Senate, and Governor. The site also made it easy to download their projection data, and so I’d like to look at their forecasts and see how their models did. Silver made three variants of his model, dubbed “Lite”, “Classic”, and “Deluxe”. Lite basically uses only polling (and, where polling is scarce or non-existent, comparisons to similar districts which have polling). Classic adds in other fundamental data, like candidate fund raising and historical voting patterns. Finally, Deluxe factors in expert ratings from the Cook Political Report, Inside Elections, and University of Virginia political scientist Larry Sabato’s Crystal Ball. Silver’s expectation is that while all three models should be good, adding additional complexity to the models should improve their accuracy.

The top line results are quite good, as all three models came very close to the number of seats won in each case:

Office   Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic
Governor R  20    18.3    18.8    19.0 D  16    17.7    17.2    17.0
House    R 203   202.1   203.6   200.6 D 232   232.8   231.4   234.3
Senate   R  11     9.7     9.5     9.5 D  22    23.3    23.5    23.5 I   2     2.0     2.0     2.0

This table allocates the 9 uncalled races to the current vote total leader, and so counts the Mississippi Senate special election which is headed for a run-off as a Republican seat. To compute the expected number of seats for a party, I added the chance the model gave that party’s candidate of winning the race.

The models gave probabilistic forecasts for each race, showing not only a chance of each candidate winning, but also expected vote share, as well as 10th and 90th percentile vote shares, to give a sense of the possible range of outcomes. So next I thought
I’d check how well the model did at setting these percentiles:

Model  Total    < 10%      < 50%     > 90%   
Lite    1233  100 8.1%  639 51.8%  102 8.3%
Classic 1233   77 6.2%  634 51.4%   74 6.0%
Deluxe  1233   84 6.8%  641 52.0%   81 6.6%

So for the 1233 candidates for whom they made projections, just over half of them in each model had a vote share under their projection, about what I’d expect. Interestingly, the results tended not to contain as many surprises as the model expected. Just over 8% of the Lite projections fell above or below the 80% confidence interval, and for Deluxe and Classic those totals fell below 7%. So the models were projecting more uncertainty than we actually saw this year.

Another way to look at this is to count how often the model’s favorite did not win, and compare that with the model’s expected number of upsets. Expected upsets is the sum of the odds of the non-favorite candidate winning each race. If the model were perfectly calibrated, and results normally distributed around expectations, we should see expected upsets match actual upsets.

Model     Upsets   Expected Total 
Lite     25  4.9% 40.5  8.0% 506 
Deluxe   17  3.4% 29.9  5.9% 506 
Classic  20  4.0% 34.6  6.8% 506 

All three models predicted more upsets than we actually saw, which is consistent with the model being not confident enough. As Silver expected, the frequency of both projected and actual upsets for each model decreases as the complexity increases. In simpler English, adding more to a model makes it better at prediction: Lite had the most upsets, Deluxe the fewest, and Classic was in the middle.

I also broke down the above table into the four categories the site was showing on election night: Toss Up (no candidate has a 60% or better chance to win), Lean (favorite is 60-75% likely), Likely (favorite is 75-95% to win), and Solid (favorite expected to win more than 95%):

                 Toss Up                      Lean  
Model     Upsets   Expected Total   Upsets   Expected Total 
Lite     16 53.3% 13.6 45.3%  30    5 13.2% 11.9 31.4%  38 
Deluxe    6 40.0%  6.8 45.0%  15    7 21.2% 11.1 33.6%  33 
Classic   9 36.0% 11.1 44.4%  25    7 23.3%  9.8 32.6%  30 
                  Likely                     Solid  
Model     Upsets   Expected Total   Upsets   Expected Total 
Lite      4  4.1% 12.7 12.9%  98    0  0.0%  2.3  0.7% 340 
Deluxe    4  5.0% 10.5 13.2%  80    0  0.0%  1.5  0.4% 378 
Classic   4  4.7% 12.0 14.1%  85    0  0.0%  1.7  0.5% 366 

None of the solid favorites lost in any of the models, although with well over 300 races, the probabilities given would have suggested 1-2 longshot upsets. Toss Up races were good, with the favorite expected to lose about 45% of the time, and, albeit in small sample sizes, they did lose about that often, or sometimes more. It’s the Lean and especially Likely categories where we see the big gap in upset races. Rather than winning about 1 in 7 or 1 in 8 Likely races, the underdog won only about 1 in 20, only a third as often. For Lean races we’d expect about 1 in 3 to be an upset, but the favorites won 3 in 4 or better.

So in terms of both extremity of vote share, and frequency of updates, I found that the models predicted much more uncertainty than election results showed.

Does that mean they were poorly calibrated? You can’t really tell from a single election. Often polling in retrospect underestimates one party or the other across the board (which one it favors is basically a coin flip), and in such environments you’d see more upsets. Before the election Silver talked about the downside risk for Republicans being much larger than for Democrats – that is, more GOP-held seats were Likely or Lean, and so if results proved more Blue, we would have seen many more seat gains for the Democrats, but if results were more Red, Democrats were still likely to gain House seats, just not so many.

I tweaked my analysis code to allow a parallel shift in vote share – that is I take the actual results, and then take, say, 2 points from the Democrat and give it to the Republican. So this is an effective 4 point swing towards Republicans. Here’s what would have happened in that scenario:

Model     Upsets   Expected Total 
Lite     29  5.7% 40.5  8.0% 506 
Deluxe   31  6.1% 29.9  5.9% 506 
Classic  30  5.9% 34.6  6.8% 506 
Office   Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic
Governor R  22    18.3    18.8    19.0 D  14    17.7    17.2    17.0
House    R 220   202.1   203.6   200.6 D 215   232.8   231.4   234.3
Senate   R  14     9.7     9.5     9.5 D  19    23.3    23.5    23.5 I   2     2.0     2.0     2.0
Model  Total    < 10%      < 50%     > 90%   
Lite    1233  157 12.7%  644 52.2%  153 12.4%
Deluxe  1233  144 11.7%  643 52.1%  136 11.0%
Classic 1233  139 11.3%  643 52.1%  142 11.5%

Now the Deluxe model almost exactly nails the number of upsets, while Classic is close, and only Lite is markedly low. In this more red environment, the GOP narrowly holds the House, with 220 seats, and it wins 14 Senate seats, for a 4 seat gain there. For all three models, the number of candidates with vote shares outside the 80% range is a little more than you’d expect in each direction.

If the error were in the opposite direction, and we saw Democrats do 2 points better across the board and Republicans 2 points worse, this is what I get:

Model     Upsets   Expected Total 
Lite     33  6.5% 40.5  8.0% 506 
Deluxe   27  5.3% 29.9  5.9% 506 
Classic  32  6.3% 34.6  6.8% 506 
Office   Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic Party    Lite  Deluxe Classic
Governor R  16    18.3    18.8    19.0 D  20    17.7    17.2    17.0
House    R 184   202.1   203.6   200.6 D 251   232.8   231.4   234.3
Senate   R   8     9.7     9.5     9.5 D  25    23.3    23.5    23.5 I   2     2.0     2.0     2.0
Model  Total    < 10%      < 50%     > 90%   
Lite    1233   86 7.0%  643 52.1%   89 7.2%
Deluxe  1233   80 6.5%  651 52.8%   89 7.2%
Classic 1233   75 6.1%  648 52.6%   76 6.2%

This environment also produces more upsets, although each of the models still expected a few more upsets than occurred in this scenario.

Now instead of Republicans narrowly holding the House, Democrats flip more than 50 House seats, and also barely take control of the Senate by 1 seat. Interestingly, I still see fewer than 10% of candidates getting above the 90th percentile or below the 10th percentile expected vote share.

These scenarios also show how seat total projections would be impacted by a systematic error. Instead of virtually nailing the number of House seats, the models would be off by 15-20 or so, and they would do worse at calling Governor and Senate races.

This year was quite good overall for polling accuracy, and so the models overstated the uncertainty for this year’s election. Even in a different environment, the models only get to about the number of upsets predicted. In order start to see more upsets than the models predicted, I need to shift each race at least 3-4 points towards one party of the other, and that would be a *very* large polling miss. So while the models did quite well in projecting the actual outcome, I think they were likely overstating uncertainty.

Posted in Politics, Projections | Comments Off on Reviewing the FiveThirtyEight 2018 Election Forecasts

Reverse Platoon Splits – Before and After

Tom Tango asked someone to see what players with at least a 30-point wOBA reverse platoon split in their first 2500 plate appearances did for the rest of their careers, and Sean Foreman kindly provided split data from his database.

So I hacked together a Perl script to read his data and analyze it. The script lets you tweak filters, setting a minimum reverse split over some minimum number of plate appearances versus both left and right handed pitchers. Taking Tango’s suggested criteria, I found just 7 matching players, with an aggregate 40 point negative wOBA split:

                             Before                                   After
                        vs LHP     vs RHP                      vs LHP       vs RHP
Player     years     wOBA    PA  wOBA   PA  Split   years    wOBA   PA    wOBA   PA  Split
glanvdo01 1996-2000 0.313   710 0.346  2005 0.033 2001-2004 0.321   431 0.280  1136 -0.041
batisto01 1996-2002 0.316   872 0.351  2454 0.035 2003-2007 0.291   462 0.310  1171  0.019
bergewa01 1930-1935 0.365   538 0.407  2101 0.041 1936-1940 0.385   289 0.370  1335 -0.015
jonesad01 2006-2012 0.316   870 0.358  2246 0.043 2013-2018 0.334  1021 0.344  2851  0.010
samueju01 1983-1987 0.320   840 0.351  2036 0.030 1988-1998 0.344  1556 0.310  2232 -0.034
leeca01   1999-2003 0.329   642 0.367  2347 0.038 2004-2012 0.373  1396 0.357  4402 -0.015
jacksra01 1950-1956 0.311   599 0.359  2174 0.048 1957-1959 0.277    74 0.276   326 -0.002
Total               0.322  5071 0.362 15363 0.040           0.345  5229 0.336 13453 -0.009

Tango expected the group would be neutral, and in aggregate they showed a 9-point “normal” split in wOBA. Only two of the 7 still showed a reverse platoon split, and in both cases it was much narrower than their start of career split.

But just 7 batters making his cutoff seems a little small of a sample size. So I reran, this time allowing batters with at least a 20 point wOBA reverse split, and now I get 25 players whose aggregate split is 30 points:

                             Before                                   After
                        vs LHP     vs RHP                      vs LHP       vs RHP
Player     years     wOBA    PA  wOBA   PA  Split   years    wOBA   PA    wOBA   PA  Split
glanvdo01 1996-2000 0.313   710 0.346  2005 0.033 2001-2004 0.321   431 0.280  1136 -0.041
casege01  1937-1945 0.313   705 0.339  2216 0.027 1946-1947 0.298   129 0.270   348 -0.028
heganji01 1941-1952 0.286  1076 0.306  2123 0.020 1953-1960 0.291   876 0.296  1146  0.005
batisto01 1996-2002 0.316   872 0.351  2454 0.035 2003-2007 0.291   462 0.310  1171  0.019
spencda01 1952-1959 0.304   662 0.326  2088 0.022 1960-1963 0.366   449 0.337   913 -0.029
perezto01 1964-1970 0.352  1071 0.377  2261 0.026 1971-1986 0.388  2358 0.334  5166 -0.054
bressed01 1956-1964 0.316   827 0.342  2378 0.026 1965-1967 0.308   294 0.294   574 -0.015
mondera01 1993-1998 0.349   719 0.376  2366 0.027 1999-2005 0.383   740 0.338  2544 -0.045
wilsopr01 1998-2003 0.343   791 0.367  2242 0.023 2004-2007 0.375   367 0.320  1036 -0.054
disarga01 1990-1997 0.275   846 0.299  2251 0.024 1998-2000 0.311   213 0.310   722 -0.001
jonesad01 2006-2012 0.316   870 0.358  2246 0.043 2013-2018 0.334  1021 0.344  2851  0.010
zimmedo01 1954-1962 0.284   566 0.305  2053 0.021 1963-1965 0.294   300 0.316   674  0.022
hemslro01 1928-1939 0.296   721 0.323  2068 0.027 1940-1947 0.322   434 0.288  1108 -0.034
cimolgi01 1956-1963 0.307  1008 0.327  2258 0.020 1964-1965 0.141    53 0.287    23  0.146
ugglada01 2006-2009 0.340   645 0.370  2053 0.030 2010-2015 0.332   760 0.333  2051  0.001
suzukku01 2007-2012 0.292   780 0.321  2171 0.029 2013-2018 0.343   640 0.309  1728 -0.034
samueju01 1983-1987 0.320   840 0.351  2036 0.030 1988-1998 0.344  1556 0.310  2232 -0.034
fiskca01  1969-1977 0.358   778 0.380  2047 0.021 1978-1993 0.349  2292 0.341  4736 -0.008
jacksra01 1950-1956 0.311   599 0.359  2174 0.048 1957-1959 0.277    74 0.276   326 -0.002
bergewa01 1930-1935 0.365   538 0.407  2101 0.041 1936-1940 0.385   289 0.370  1335 -0.015
ludwiry01 2002-2011 0.328   947 0.356  2027 0.027 2012-2014 0.376   267 0.323   745 -0.053
boonebo01 1972-1978 0.301   820 0.325  2090 0.024 1979-1990 0.293  1629 0.295  3609  0.001
hollima01 2004-2008 0.388   660 0.415  2308 0.027 2009-2018 0.391  1309 0.373  3704 -0.018
hornebo01 1978-1983 0.365   638 0.389  2078 0.024 1984-1988 0.392   479 0.347  1018 -0.045
leeca01   1999-2003 0.329   642 0.367  2347 0.038 2004-2012 0.373  1396 0.357  4402 -0.015
Total               0.321 19331 0.352 54441 0.030           0.347 18818 0.331 45298 -0.017

Now we finally find a player, Don Zimmer (yes, the former Red Sox manager and Yankees coach!) who showed a greater reverse split over the rest of his career than the start, but again just barely, increasing to 22 points from 21. Just 6 other players showed any reverse split at all over the rest of their careers, and two just by a single wOBA point.

When I set the filter back at 30 wOBA points, but require fewer PA (250 against LHP, and 1000 against RHP),  I find a total of 57 players, with this aggregate:

Total               0.308 25543 0.348 67831 0.040           0.349 37399 0.331 93823 -0.018

Interestingly, this group in aggregate shows an 18-point normal platoon split over the rest of their careers, after a 40 point reverse split from the start of their careers.

So overall, it seems that even players who show a pronounced reverse platoon split tend to show a more normal platoon split over the rest of their careers.

If you want to tinker with different filters, you can download the raw data Sean Foreman made available, and pass it to my Perl script with different arguments.

Posted in Major League Baseball, Sabermetrics | Comments Off on Reverse Platoon Splits – Before and After

Professional Standings Now Available on RotoValue

RotoValue now also shows professional standings for both MLB and NBA. And, like the fantasy standings pages, you can customize them to any date range you like. So if you want to see how your team has done since a particular date, or over the past month, or since they called up Gleyber Torres or Ronald Acuna, RotoValue’s ProStandings page can show you. The Braves were two games behind the Nationals between Acuna’s call-up and his going on the DL on May 28th, while the Yankees have gained 7 games on the Red Sox since Torres was recalled through yesterday, June 9th.

Like the rest of RotoValue, this page is advertising-free, but it also adds some data that may be useful to fantasy sports players. In addition to wins, losses, and percentage, the page also shows runs (points for the NBA) scored and allowed per game. It’s a nice quick way to see how teams are doing, and if your pitcher is facing a tough offense next week, or your player is going up against harder defenses.

In addition, I show the “Pythagorean” records for each team, computing the winning percentages estimated from scored and allowed data. The basic model is:
RSX/(RSX + RAX), where X varies depending on the sport. When Bill James first proposed the concept, he used X=2 for MLB, but subsequent research has found a better fit with a somewhat smaller exponent. I’m using 1.83 for MLB, and 13.91 for NBA as proposed by Daryl Morey. Teams whose records are much better (or worse) than their expected records have typically been (un)lucky, and are more likely to revert closer to their expected winning percentage. That can give you insight on which teams may be more (or less) likely to contend for a playoff spot or (often more important for single-league fantasy purposes) trade away talent.

A link to this standings page now appears on the toolbar under the, uh, Standings menu:

Image of ProStandings page

In addition to ProStandings, there is also a Scoreboard page, which shows games, and, where appropriate, scores, for a given date.

Posted in Major League Baseball, NBA Basketball | Comments Off on Professional Standings Now Available on RotoValue

One Player, Two Kinds of Stats: Handling Shohei Ohtani

This year the Angels signed Shohei Ohtani, the young Japanese star who played both as a pitcher and outfielder for the Hokkaido Nippon Ham Fighters of the Japanese Pacific League. As in Japan, Ohtani is expected to continue to play extensively both as a pitcher and hitter.

Some players have played significantly at the major league level in both capacities in different seasons (most recently Rick Ankiel, and more famously Babe Ruth), but in the modern game it is unheard of for one person to contribute both as a batter and a pitcher in the same season. Until, probably, 2018, when Ohtani seems poised to do so.

This raises the question of how to handle Ohtani for fantasy purposes.

Yahoo! is handling Ohtani by turning him into two different players, a batter and a pitcher, and they’re allowing different fantasy teams to own the different stats of Ohtani. While I can see how that might be an easier technical fix in some platforms, it’s ugly, and it loses the benefit of flexibility a two-way player gives a real MLB team. Allowing different fantasy teams to own Ohtani’s batting and pitching stats is simply wrong, and making a team use two roster spots to be able to get all of Ohtani’s stats does’t match the true flexibility the Angels will have by having a single player capable of contributing both in the batters’ box and on the pitching mound.

CBS and ESPN are a little better: they let you put Ohtani either at P or in a batting slot, but they only count the pitching stats when he is in your lineup as a pitcher, and just the batting stats when he’s in your lineup as a batter. While that’s not terrible for a daily transaction league (you can swtich Ohtani in your lineup based on where he’ll play for the Angels to get most of his stats), you’ll still lose out on Ohtani’s offensive stats on days he’s pitching in an NL park and has to bat for himself. If you’re in a league with weekly transactions, though, this solution forces you to decide whether you want Ohtani’s batting or pitching numbers for an entire week.

RotoValue lets you have the best of both worlds: by default, it will count all Ohtani’s stats, no matter where you play him.

Because RotoValue already supports an option to count pitcher’s batting stats (and/or batters’ pitching stats), it was actually quite easy to enhance it to support counting both types of statistics for a player who qualifies at both an offensive and pitching position, while only counting the primary statistics for players who just qualify at one type. So by default, RotoValue will count both batting and pitching stats for Ohtani whenever he plays, but will ignore other pitchers’ batting statistics and other batters’ pitching statistics. Strictly speaking, this is not special treatment for Ohtani, but would apply to any player who qualifies both as a pitcher and a batter. It’s just that for now, Ohtani is likely to be the only one to do so: for now RotoValue is listing him as SP/DH, although it’s possible we may give him OF eligibility instead (or in addition to) DH eligibility.

Suppose you really want Ohtani to count only as a pitcher. You can simply set his custom position to be SP only, and his batting statistics will no longer count. Or you could make him eligible at DH or OF only, and then only his batting statistics would count. Such changes would take effect immediately and retroactively: reload a standings or team stats page after changing the setting, and the full season totals will reflect the new settings.

So – how do you change a player’s position?

If you have administrator rights for your league, on the Player Detail page there’s a menu item under Settings: “Set custom position for…”.

PlayerDetail for Shohei Ohtani

That menu option gives you a page where you can override the position for a player:

Custom Position for Shohei Ohtani

If you simply want all of Ohtani’s stats to count, you won’t have to make any of these changes, but if you want him to be a pitcher only (or, less likely, a batter only), you’d need to override his default position. So RotoValue gives you the choice to handle Ohtani how you think it makes better sense for your league.

Alternatively, if you want to count all pitchers’ batting (and/or batters’ pitching) stats, you can choose those options directly from the bottom of the main league Settings page:

Update League settings

Posted in Fantasy Strategy, Major League Baseball | Comments Off on One Player, Two Kinds of Stats: Handling Shohei Ohtani

Fighting the Last War

This week’s Riddler at FiveThirtyEight reruns a puzzle initially run back in February. The game is a “war” between two warlords fighting over 10 castles. Each warlord has 100 soldiers, and the 10 castles are worth from 1 to 10 points each. If you send more soldiers to a given castle than your opponent, you win it and its point value; if you both send exactly the same number of soldiers, the point value for the castle is shared between the two. The winner of the war is the one with more total points, and so ties are possible overall only if you’ve shared at least one castle with an odd point value. The goal of the first Riddler challenge was to have the most success against all other entries in the contest.

They’ve made the entries from the first contest public, and so that gives one a chance to see not only the winners, but all the entrants, and to play around with the data. I quickly hacked up a perl script to take their data and compute the results of the first contest. That ran pretty slowly (doing head-to-head battles for all the nearly 1400 entrants took about 12 seconds on my laptop), but accurately. I then rewrote the code in C, and saw the same thing run in about a tenth of a second, two orders of magnitude faster.

This made me wonder whether brute force might be useful to attack the problem, so first I wrote a C program to count all the possible allocations of the 100 armies across 10 castles. This took, uh, a while to run – about 8 hours – but I got my answer – a little over 4.25 trillion combinations.

That didn’t sound too good, but I quickly modified code so I could start trying combinations in order against the prior entries, and print out any one that did as well or better than the best combination I now tried. Last Sunday night I kicked that program off on my laptop. A week later, the program is still running, and has only gone through about 6 billion combinations, but it did find a few which did better than the best performing entry last time, so I’m entering the best of those today.

I wanted to see if I could speed things up, and one way to do that was to test, not against nearly 1400 entries, but a subset. I figured that people might look at which ones did well last time, and try to do something similar, so my subset was the 179 entries that, by my calculations, won 70% or more of their head-to-head matchups in the prior contest. I found many combinations that beat all 179 of those, so I then tried to compare lots of those against the larger dataset. So far, that has not given me a better result against the whole set from the last contest, but I’m entering one of those also, on the theory that this week’s entries will tend to look more like the very good ones from the last time than the whole set.

So my first entry is:

0 0 2 2 11 21 3 31 26 4

I’ll update this post later with links to the code used to generate these.

Update Wenesday 31 May 2017:

I’ve created a github repository for the code I was using here:

https://github.com/GeoffBuchan/Riddler/tree/master/Castle

Also, I realized that when I first built the C programs, I did not add any optimization switches. The count.c program which took several hours to run before runs in under 10 minutes on my laptop now that I’ve added -O2 optimization with gcc. So if I’d done this in the first place, I’d have been able to search a lot further!

Posted in Riddler, Software | Comments Off on Fighting the Last War

#FixTheWin

A few years back Brian Kenney introduced the hashtag #KillTheWin, which still lives on. Baseball fans point out egregious cases of a pitcher getting a win in a game despite pitching poorly. While the win might have been a useful metric for pitchers in the dead-ball era, when starters completed more than half their starts in aggregate, as baseball has evolved and bullpens have risen in prominence, the definition certainly has its flaws.

But it also has its history. Baseball fans still marvel at Cy Young’s total of 511, a seemingly impossible goal. While it’s become less common in the 21st century, a 20-game winner was once a standard of excellence among pitchers. Killing the win loses touch with that history. So instead, perhaps, we should consider trying improve it instead. In 2014, Tom Tango proposed a complete redefinition, a simple points system that would usually determine the pitcher most deserving of a win (or loss) in a given game. I wrote about the idea, and created a page to compute wins and losses under that proposal. Recently Tango e-mailed me with a different idea: suppose we keep the basic structure of the current rule, but instead of giving the win to the pitcher who recorded the last out before a team takes its lead for the final time, we give it to the first pitcher whose line for the game closes in a position to win the game, so long as that pitcher’s team still wins.

Continue reading

Posted in Sabermetrics | Comments Off on #FixTheWin

Comparing Projected HR leaders to actual

Tom Tango asked an interesting question on Twitter yesterday:
TangoForecastHR

The odds of the projected HR leader actually leading the league is an interesting question. I’ve been doing projections since 2011, so I thought I’d sweep my database for the RotoValue projections and see what that history was. That gives me just five years, but it turns out my projection model did correctly name the MLB home run leader once in those five years, or 20% of the time. Chris Davis hit 47 HR in 2015, leading MLB, while my model projected him to hit 35 HR. Note that when you’re leading the league, you’re not only very likely beating your own projection, you’re also probably beating all the projections, because the projected totals can be considered a weighted average of all possible outcomes for that player, and the possibility of a bad year or injury will pull that average down from what a peak player will produce when healthy. Also it’s not unusual for the league leader to be a player having a breakout year, well above what his past performance suggested.

Last year, Mark Trumbo’s career-best 47 HR topped MLB, despite my model projecting him for just 21.3 HR, the 46th best total. My projected 2016 leader was Chris Davis again, now projected to hit 37.9, and he actually slightly edged that out, with 38 HR. But I was not projecting the overall home run surge, and Davis’s 38 HR ranked only 12th best.

The actual MLB HR leader has regularly surprised my projections model. Only one other year, 2012, when Miguel Cabrera’s 44 HR led, was the actual leader among my preseason projected leaders (my model projected 34.5 HR for Cabrera, the 5th best projection). In the other years, the leader was projected 36th (Nelson Cruz moving to Baltimore in 2014), 90th (Chris Davis in 2013) and 73rd (Jose Bautista in 2011) by my model.

Over the 6 years, of the 60 players projected to finish in the top 10, 22 of them did so. Also, 24 of the 60 equaled or bettered their projected total, while 36 failed to reach the projected value.

Below the jump I’ve put tables for each year showing players projected to be in the top 10 in HR in my model, along with any players who actually finished in the top 10, along with their projected values.

Continue reading

Posted in Major League Baseball, Projections, Sabermetrics | Comments Off on Comparing Projected HR leaders to actual

Count von Count Riddler

I periodically attempt the FiveThirtyEight Riddler, edited by Oliver Roeder. This week he’s actually presenting two, a shorter “Riddler Express”, and a more time consuming “Riddler Classic”.

The Sesame Street character Count von Count likes to, well, count, and he now has his own twitter feed! For those who don’t recall the character from the show, he’s a purple muppet dressed in black, as a sort of kindly Dracula, who would count up in an eastern European accent.

Well, the twitter feed is simply the Count counting, albeit in words written out describing the number. As I type this, his latest tweet is “Eight Hundred Seventy Nine!”

So the Riddler Express is to find out how high can Count von Count count on twitter in this way before hitting its 140 character limit. Because he is enthusiastic, all his tweets must end with an exclamation mark.

Continue reading

Posted in Riddler | Comments Off on Count von Count Riddler

Pirate Riddler

This week’s FiveThirtyEight Riddler is a logic puzzle. Assume 10 “Perfectly Rational Pirate Logicians”. The pirates have found 10 gold pieces, and the puzzle is to figure out how they will allocate the loot among themselves. There are several constraints.

First, the pirates themselves are ranked in a strict hierarchy, with the Captain at the top, and then the others ordered beneath them, so if for any reason the Captain is no longer able to fulfill his duties, the second in command becomes new Captain, and everyone else moves up one step on the hierarchy.

Second, the Pirates practice a form of democracy. While the Captain, due to rank, gets to propose an allocation, the whole crew, with one vote per pirate, votes on the proposal. If that proposal gets half or more of the vote, it carries, but if more than half of the pirates vote against it, they will mutiny, killing the old Captain, and leaving it to the new captain to propose an allocation.

So our perfectly rational pirate logicians have three constraints in determining how they will vote on a proposed allocation:

  1. They value life above all, so they will not vote in a way to put their own lives at risk of mutiny if they can at all avoid doing so.
  2. They are greedy, so as long as their life is not at stake, they will vote for something which maximizes their own personal share of the loot.
  3. They are bloodthirsty, so if they have two choices in which they’d remain alive but get the same booty, they will prefer to mutiny and kill the captain.

Given the above constraints, how will the 10 pirates allocate their 10 newly found gold pieces?

Continue reading

Posted in Riddler | Comments Off on Pirate Riddler

FiveThirtyEight Baseball Division Champs Puzzle

Update: I’ve added a link to the Perl progam I used to do these simulations.

Oliver Roeder presents a weekly puzzler on FiveThirtyEight, and this week it was a baseball-themed puzzle. Assume a sport (say, “baseball”) in which each team plays 162 games in a season. Also assume a “division” (e.g. the “AL East”) containing 5 teams, each of exactly equal skill. In other words, each team has exactly a 50% chance of winning any given game. The puzzle is to compute the expected value of wins for the division-winner.

Interestingly, the problem is open to interpretation, and the result I get depends on what assumptions I make. My initial assumption was to treat each game for each team as a simple coin-flip. I ran 100,000 simulated “seasons”, getting an average of 88.4 wins for the division leader. But games have two teams, and who the opponent is could matter to this problem. In an extreme situation, the “coin flip” model could result in winning the division with 0 wins, in the highly improbable event that each team lost every game.

Since I happened to have the 2016 MLB schedule available, I used it for each game. This adds the constraint that in games involving two teams in the same division, one team winning implies its opponent must lose. Doing this, I got an average of 88.8 wins for the division winner.

The third variant I tested produced the toughest constraint: I assumed the five teams only played games among themselves (at least 40 against each opponent). Thus every win for one team always means a loss for its opponent. This gave me an average of 89.3 wins for the division winner.

Continue reading

Posted in Major League Baseball, Riddler, Sabermetrics | Comments Off on FiveThirtyEight Baseball Division Champs Puzzle