More on Instant Running HOF Playing

Yesterday I talked about simulating Tom Tango’s idea for doing instant runoff voting for the Hall of Fame, and using prior year public ballot data to simulate his ranked ballots.
In the comments in his blog, he asked about assuming all voters draw the line at 4 players (so long as they have at least 4 votes on their ballot), with randomly picking which players would stay above the line. I implemented that change – I can now set the line arbitrarily at X for any given X. The random selection I did weighted choices using actual election vote totals, so the more real votes a player got, the more likely they were to be picked (either in the order for adding to extra votes to get to 10, or in determining which voter’s choices stayed above the line). I also modified the program to be able to run multiple trials, and summarize the results from that. Here’s 10,000 trials using the 2012 ballot data, assuming each voter draws the line at 4 players:
114 total votes. 86 needed for election. 10000 simulations.

Player Mod Votes Actual Votes Elected Above Votes
Barry Larkin 112.43 102 10000 95.44
Jack Morris 103.25 67 10000 55.67
Jeff Bagwell 99.98 68 10000 52.60
Tim Raines 97.26 65 9997 47.42
Lee Smith 91.33 51 9536 39.22
Alan Trammell 84.42 41 3941 23.84
Edgar Martinez 82.52 38 2442 21.82
Fred McGriff 67.53 29 0 15.41
Larry Walker 62.24 22 0 9.58
Mark McGwire 57.45 21 0 10.63
Don Mattingly 46.59 11 0 6.56
Dale Murphy 43.56 14 0 6.78
Rafael Palmeiro 41.43 15 0 8.14
Bernie Williams 25.73 3 0 1.09
Juan Gonzalez 12.00 2 0 0.85
Vinny Castilla 5.54 3 0 2.33
Bill Mueller 2.80 1 0 0.62

“Above Votes” is how often in the simulation the player was considered above the line for the voter. So if a voter had more than 4 players, I randomly pick some (weighted so players with fewer overall votes are more likely chosen) and move them below the line. Those players go before any players not on the actual voter’s ballot who get randomly added to bring the ballot up to 10 players.
For comparison, here’s 10,000 trials using my original algorithm (assuming each voter’s line is right below the players (s)he actually voted for):

Player Mod Votes Actual Votes Elected Above Votes
Barry Larkin 111.98 102 10000 102.00
Jack Morris 102.47 67 10000 67.00
Jeff Bagwell 98.04 68 10000 68.00
Tim Raines 95.59 65 9998 65.00
Lee Smith 89.60 51 8899 51.00
Alan Trammell 82.44 41 2328 41.00
Edgar Martinez 80.71 38 1285 38.00
Fred McGriff 66.24 29 0 29.00
Larry Walker 60.90 22 0 22.00
Mark McGwire 56.55 21 0 21.00
Don Mattingly 45.60 11 0 11.00
Dale Murphy 42.82 14 0 14.00
Rafael Palmeiro 40.84 15 0 15.00
Bernie Williams 25.22 3 0 3.00
Juan Gonzalez 11.86 2 0 2.00
Vinny Castilla 5.49 3 0 3.00
Bill Mueller 2.79 1 0 1.00

As a sanity check, the “Above Votes” column should equal the Actual Votes the player received, and it does. That’s good. But seeing these in summary is interesting: despite voters agreeing on wanting a larger hall, the chances of players getting in actually went down, quite a bit for more borderline cases like Alan Trammell and Edgar Martinez. Average votes are down across the board, too. I would have naively thought that lowering the bar (i.e. increasing the number of players you want in the hall) should make it easier, but actually the opposite is happening. But then I realized that with the higher bar, people “trade” votes to runoff candidates more quickly. In this case the 114 ballots averaged 4.85 slots per ballot, with some ballots going much deeper than 4. When I set the cutoff to 4 explicitly, ballots with more than 4 names start contributing to runoff totals sooner than they otherwise would, and since there I’m allocating based largely on the total vote, apparently makes it easier to get votes to other candidates!
So now here’s a run using 2013 data, with the bar set at 4:
168 total votes. 126 needed for election. 10000 simulations.

Player Mod Votes Actual Votes Elected Above Votes
Craig Biggio 146.54 117 10000 85.00
Jeff Bagwell 140.60 108 9999 70.32
Mike Piazza 135.54 98 9964 61.11
Tim Raines 134.73 106 9947 65.34
Jack Morris 130.97 95 9287 70.71
Barry Bonds 113.67 80 51 39.44
Roger Clemens 113.12 77 28 37.44
Curt Schilling 104.09 69 0 34.75
Lee Smith 97.71 55 0 33.63
Alan Trammell 95.82 61 0 28.94
Edgar Martinez 92.78 56 0 26.97
Larry Walker 57.34 28 0 10.27
Fred McGriff 54.11 28 0 13.55
Dale Murphy 51.29 26 0 10.05
Mark McGwire 47.31 23 0 7.88
Sammy Sosa 39.86 21 0 6.07
Don Mattingly 33.57 15 0 8.42
Rafael Palmeiro 32.51 19 0 5.61
Kenny Lofton 11.07 6 0 1.95
Bernie Williams 8.40 3 0 1.04
Julio Franco 2.73 1 0 0.22
David Wells 2.43 1 0 0.26
Sandy Alomar Jr 2.00 2 0 1.44
Shawn Green 1.59 1 0 0.59
Steve Finley 1.16 0 0 0.00
Aaron Sele 0.28 0 0 0.00

And here’s the same 2013 data, now using my original algorithm (the bar is below the full ballot):
168 total votes. 126 needed for election. 10000 simulations.

Player Mod Votes Actual Votes Elected Above Votes
Craig Biggio 146.18 117 10000 117.00
Jeff Bagwell 140.06 108 10000 108.00
Mike Piazza 134.38 98 9881 98.00
Tim Raines 133.56 106 9877 106.00
Jack Morris 130.15 95 8947 95.00
Barry Bonds 112.59 80 16 80.00
Roger Clemens 112.06 77 21 77.00
Curt Schilling 103.05 69 0 69.00
Lee Smith 96.44 55 0 55.00
Alan Trammell 94.93 61 0 61.00
Edgar Martinez 91.78 56 0 56.00
Larry Walker 56.69 28 0 28.00
Fred McGriff 53.45 28 0 28.00
Dale Murphy 50.69 26 0 26.00
Mark McGwire 46.67 23 0 23.00
Sammy Sosa 39.44 21 0 21.00
Don Mattingly 33.10 15 0 15.00
Rafael Palmeiro 32.21 19 0 19.00
Kenny Lofton 10.99 6 0 6.00
Bernie Williams 8.24 3 0 3.00
Julio Franco 2.68 1 0 1.00
David Wells 2.40 1 0 1.00
Sandy Alomar Jr 2.00 2 0 2.00
Shawn Green 1.56 1 0 1.00
Steve Finley 1.13 0 0 0.00
Aaron Sele 0.29 0 0 0.00

Here the voting has a sharper break between viable and non-viable candidates, but I still see that using the line at 4 results in slightly higher average vote totals for top candidates, and results in a few more times where candidates reach the 75% needed for election. I should mention again that I’m assuming that the order of overall votes can be used as a probability weighting for putting players on the ballot below the voter’s line as needed. This is obviously not a real-world assumption, as in actuality those voters leaving off Bonds and Clemens are surely doing so over PED concerns, so they would not get added below the line. And other athletes whose votes were suppressed by PED suspicion (like Bagwell, Piazza, or maybe even Biggio) would likely also be affected, albeit not as much as Bonds, Clemens, McGwire, or Palmeiro.
I then figured why not run the program against 2014 ballot data? Okay, I don’t currently have actual vote totals from all voters, but I can use vote totals in the public ballot subset instead when I’m weighting my random choices. So I did this. It does amplify the problem of sampling error (the voters who released ballots are not a representative sample of all voters), but you can only analyze the data you have, not the data you wish you had. Here’s a sample run, assuming the writer’s line is below their full ballot (the original simulation):
92 total votes. 69 needed for election. 10000 simulations.

Player Mod Votes Actual Votes Elected Above Votes
Greg Maddux 91.00 91 10000 91.00
Tom Glavine 89.00 89 10000 89.00
Frank Thomas 85.14 83 10000 83.00
Craig Biggio 77.80 74 10000 74.00
Mike Piazza 75.93 70 10000 70.00
Jeff Bagwell 66.66 61 1494 61.00
Tim Raines 60.16 55 0 55.00
Jack Morris 58.57 57 0 57.00
Barry Bonds 48.55 44 0 44.00
Roger Clemens 47.47 43 0 43.00
Curt Schilling 42.97 39 0 39.00
Mike Mussina 32.23 29 0 29.00
Edgar Martinez 24.80 22 0 22.00
Alan Trammell 24.57 22 0 22.00
Lee Smith 19.73 18 0 18.00
Fred McGriff 14.72 13 0 13.00
Larry Walker 12.60 11 0 11.00
Jeff Kent 12.46 11 0 11.00
Mark McGwire 10.25 9 0 9.00
Sammy Sosa 10.23 9 0 9.00
Rafael Palmeiro 6.87 6 0 6.00
Don Mattingly 2.30 2 0 2.00

This subset of voters is already on pace to elect 5 players under the current rules: Greg Maddux, Tom Glavine, Frank Thomas, Craig Biggio, and Mike Piazza. Adding the instant runoff votes rarely affects who is elected: Bagwell’s votes rise, but he still only reaches 75% about 15% of the time, and nobody else comes close.
Interestingly, now when I use Tango’s suggestion of drawing the line after 4 players, things get a little harder:
92 total votes. 69 needed for election. 10000 simulations.

Player Mod Votes Actual Votes Elected Above Votes
Greg Maddux 90.99 91 10000 87.99
Tom Glavine 88.98 89 10000 80.35
Frank Thomas 85.05 83 10000 59.84
Craig Biggio 77.45 74 10000 36.08
Mike Piazza 75.04 70 9905 27.69
Jeff Bagwell 65.81 61 1380 15.94
Tim Raines 59.62 55 0 11.44
Jack Morris 58.20 57 0 16.42
Barry Bonds 48.34 44 0 7.13
Roger Clemens 47.28 43 0 6.80
Curt Schilling 42.77 39 0 4.89
Mike Mussina 32.20 29 0 2.63
Edgar Martinez 24.75 22 0 1.78
Alan Trammell 24.56 22 0 1.77
Lee Smith 19.78 18 0 2.31
Fred McGriff 14.67 13 0 0.81
Larry Walker 12.61 11 0 0.76
Jeff Kent 12.49 11 0 0.64
Sammy Sosa 10.26 9 0 0.68
Mark McGwire 10.25 9 0 0.51
Rafael Palmeiro 6.89 6 0 0.46
Don Mattingly 2.30 2 0 0.07

Now, rather than having 5 locks as before (Piazza’s 70 known votes would guarantee him election in this 92-voter electorate), only 4 players always get in, and while Piazza usually does make it, he’s no longer a sure thing. Compared to 2013 and 2012, there are more candidates close to or above pace for induction, and with such broad agreement it seems that it doesn’t take as long for a given voter’s 4 picks to all be elected, at which point that voter’s ballot is no longer used to add votes to other players. So in this case the fixed level runoff of 4 doesn’t redistribute as many votes as using the overall ballot totals would, resulting in a somewhat reduced chance of reaching 75%.
So the public ballot sample (which is not representative of the full electorate, so please don’t think I’m expecting 5 players to be elected!) has shown a sharp increase in numbers of players per ballot.

Year Ballots Players per Ballot
2012 114 4.85
2013 168 6.52
2014 92 9.34

In 2012, the average ballot had fewer than 5 names, but now it’s more than 9. So the overall bar is effectively lower running my original model (assuming the bar would be after any of the voter’s choices) now than it was in 2013 or 2012.
Instant runoff voting looks like it would help accelerate getting players into the Hall of Fame, and using that process in years where there are not particularly strong candidates would de facto lower standards. Consider that Lee Smith very likely gets in using this model on 2012 data, and both Alan Trammell and Edgar Martinez sometimes make it, yet none of them come close in 2013 or 2014 as the ballot gets more crowded.
I do reiterate that the public ballots are a small, non-representative subset of the overall Hall of Fame electorate, but for 2012 and 2013, where there are actual vote percentages, I did use them instead of sample percentages in weighting whether to use a player to fill out a ballot, or deciding whether he goes above or below the line.It is an interesting variant on the voting process, and it might be good to apply it in years when the BBWAA does not elect anybody as an attempt to ensure at least one living player gets inducted each summer. I’m sure Cooperstown’s hotel managers would appreciate that!