More on Instant Running HOF Playing

Yesterday I talked about simulating Tom Tango’s idea for doing instant runoff voting for the Hall of Fame, and using prior year public ballot data to simulate his ranked ballots.
In the comments in his blog, he asked about assuming all voters draw the line at 4 players (so long as they have at least 4 votes on their ballot), with randomly picking which players would stay above the line. I implemented that change – I can now set the line arbitrarily at X for any given X. The random selection I did weighted choices using actual election vote totals, so the more real votes a player got, the more likely they were to be picked (either in the order for adding to extra votes to get to 10, or in determining which voter’s choices stayed above the line). I also modified the program to be able to run multiple trials, and summarize the results from that. Here’s 10,000 trials using the 2012 ballot data, assuming each voter draws the line at 4 players:
114 total votes. 86 needed for election. 10000 simulations.

Player	Mod Votes	Actual Votes	Elected	Above Votes
Barry Larkin	112.43	102	10000	95.44
Jack Morris	103.25	67	10000	55.67
Jeff Bagwell	99.98	68	10000	52.60
Tim Raines	97.26	65	9997	47.42
Lee Smith	91.33	51	9536	39.22
Alan Trammell	84.42	41	3941	23.84
Edgar Martinez	82.52	38	2442	21.82
Fred McGriff	67.53	29	0	15.41
Larry Walker	62.24	22	0	9.58
Mark McGwire	57.45	21	0	10.63
Don Mattingly	46.59	11	0	6.56
Dale Murphy	43.56	14	0	6.78
Rafael Palmeiro	41.43	15	0	8.14
Bernie Williams	25.73	3	0	1.09
Juan Gonzalez	12.00	2	0	0.85
Vinny Castilla	5.54	3	0	2.33
Bill Mueller	2.80	1	0	0.62

“Above Votes” is how often in the simulation the player was considered above the line for the voter. So if a voter had more than 4 players, I randomly pick some (weighted so players with fewer overall votes are more likely chosen) and move them below the line. Those players go before any players not on the actual voter’s ballot who get randomly added to bring the ballot up to 10 players.
For comparison, here’s 10,000 trials using my original algorithm (assuming each voter’s line is right below the players (s)he actually voted for):

Player	Mod Votes	Actual Votes	Elected	Above Votes
Barry Larkin	111.98	102	10000	102.00
Jack Morris	102.47	67	10000	67.00
Jeff Bagwell	98.04	68	10000	68.00
Tim Raines	95.59	65	9998	65.00
Lee Smith	89.60	51	8899	51.00
Alan Trammell	82.44	41	2328	41.00
Edgar Martinez	80.71	38	1285	38.00
Fred McGriff	66.24	29	0	29.00
Larry Walker	60.90	22	0	22.00
Mark McGwire	56.55	21	0	21.00
Don Mattingly	45.60	11	0	11.00
Dale Murphy	42.82	14	0	14.00
Rafael Palmeiro	40.84	15	0	15.00
Bernie Williams	25.22	3	0	3.00
Juan Gonzalez	11.86	2	0	2.00
Vinny Castilla	5.49	3	0	3.00
Bill Mueller	2.79	1	0	1.00

As a sanity check, the “Above Votes” column should equal the Actual Votes the player received, and it does. That’s good. But seeing these in summary is interesting: despite voters agreeing on wanting a larger hall, the chances of players getting in actually went down, quite a bit for more borderline cases like Alan Trammell and Edgar Martinez. Average votes are down across the board, too. I would have naively thought that lowering the bar (i.e. increasing the number of players you want in the hall) should make it easier, but actually the opposite is happening. But then I realized that with the higher bar, people “trade” votes to runoff candidates more quickly. In this case the 114 ballots averaged 4.85 slots per ballot, with some ballots going much deeper than 4. When I set the cutoff to 4 explicitly, ballots with more than 4 names start contributing to runoff totals sooner than they otherwise would, and since there I’m allocating based largely on the total vote, apparently makes it easier to get votes to other candidates!
So now here’s a run using 2013 data, with the bar set at 4:
168 total votes. 126 needed for election. 10000 simulations.

Player	Mod Votes	Actual Votes	Elected	Above Votes
Craig Biggio	146.54	117	10000	85.00
Jeff Bagwell	140.60	108	9999	70.32
Mike Piazza	135.54	98	9964	61.11
Tim Raines	134.73	106	9947	65.34
Jack Morris	130.97	95	9287	70.71
Barry Bonds	113.67	80	51	39.44
Roger Clemens	113.12	77	28	37.44
Curt Schilling	104.09	69	0	34.75
Lee Smith	97.71	55	0	33.63
Alan Trammell	95.82	61	0	28.94
Edgar Martinez	92.78	56	0	26.97
Larry Walker	57.34	28	0	10.27
Fred McGriff	54.11	28	0	13.55
Dale Murphy	51.29	26	0	10.05
Mark McGwire	47.31	23	0	7.88
Sammy Sosa	39.86	21	0	6.07
Don Mattingly	33.57	15	0	8.42
Rafael Palmeiro	32.51	19	0	5.61
Kenny Lofton	11.07	6	0	1.95
Bernie Williams	8.40	3	0	1.04
Julio Franco	2.73	1	0	0.22
David Wells	2.43	1	0	0.26
Sandy Alomar Jr	2.00	2	0	1.44
Shawn Green	1.59	1	0	0.59
Steve Finley	1.16	0	0	0.00
Aaron Sele	0.28	0	0	0.00

And here’s the same 2013 data, now using my original algorithm (the bar is below the full ballot):
168 total votes. 126 needed for election. 10000 simulations.

Player	Mod Votes	Actual Votes	Elected	Above Votes
Craig Biggio	146.18	117	10000	117.00
Jeff Bagwell	140.06	108	10000	108.00
Mike Piazza	134.38	98	9881	98.00
Tim Raines	133.56	106	9877	106.00
Jack Morris	130.15	95	8947	95.00
Barry Bonds	112.59	80	16	80.00
Roger Clemens	112.06	77	21	77.00
Curt Schilling	103.05	69	0	69.00
Lee Smith	96.44	55	0	55.00
Alan Trammell	94.93	61	0	61.00
Edgar Martinez	91.78	56	0	56.00
Larry Walker	56.69	28	0	28.00
Fred McGriff	53.45	28	0	28.00
Dale Murphy	50.69	26	0	26.00
Mark McGwire	46.67	23	0	23.00
Sammy Sosa	39.44	21	0	21.00
Don Mattingly	33.10	15	0	15.00
Rafael Palmeiro	32.21	19	0	19.00
Kenny Lofton	10.99	6	0	6.00
Bernie Williams	8.24	3	0	3.00
Julio Franco	2.68	1	0	1.00
David Wells	2.40	1	0	1.00
Sandy Alomar Jr	2.00	2	0	2.00
Shawn Green	1.56	1	0	1.00
Steve Finley	1.13	0	0	0.00
Aaron Sele	0.29	0	0	0.00

Here the voting has a sharper break between viable and non-viable candidates, but I still see that using the line at 4 results in slightly higher average vote totals for top candidates, and results in a few more times where candidates reach the 75% needed for election. I should mention again that I’m assuming that the order of overall votes can be used as a probability weighting for putting players on the ballot below the voter’s line as needed. This is obviously not a real-world assumption, as in actuality those voters leaving off Bonds and Clemens are surely doing so over PED concerns, so they would not get added below the line. And other athletes whose votes were suppressed by PED suspicion (like Bagwell, Piazza, or maybe even Biggio) would likely also be affected, albeit not as much as Bonds, Clemens, McGwire, or Palmeiro.
I then figured why not run the program against 2014 ballot data? Okay, I don’t currently have actual vote totals from all voters, but I can use vote totals in the public ballot subset instead when I’m weighting my random choices. So I did this. It does amplify the problem of sampling error (the voters who released ballots are not a representative sample of all voters), but you can only analyze the data you have, not the data you wish you had. Here’s a sample run, assuming the writer’s line is below their full ballot (the original simulation):
92 total votes. 69 needed for election. 10000 simulations.

Player	Mod Votes	Actual Votes	Elected	Above Votes
Greg Maddux	91.00	91	10000	91.00
Tom Glavine	89.00	89	10000	89.00
Frank Thomas	85.14	83	10000	83.00
Craig Biggio	77.80	74	10000	74.00
Mike Piazza	75.93	70	10000	70.00
Jeff Bagwell	66.66	61	1494	61.00
Tim Raines	60.16	55	0	55.00
Jack Morris	58.57	57	0	57.00
Barry Bonds	48.55	44	0	44.00
Roger Clemens	47.47	43	0	43.00
Curt Schilling	42.97	39	0	39.00
Mike Mussina	32.23	29	0	29.00
Edgar Martinez	24.80	22	0	22.00
Alan Trammell	24.57	22	0	22.00
Lee Smith	19.73	18	0	18.00
Fred McGriff	14.72	13	0	13.00
Larry Walker	12.60	11	0	11.00
Jeff Kent	12.46	11	0	11.00
Mark McGwire	10.25	9	0	9.00
Sammy Sosa	10.23	9	0	9.00
Rafael Palmeiro	6.87	6	0	6.00
Don Mattingly	2.30	2	0	2.00

This subset of voters is already on pace to elect 5 players under the current rules: Greg Maddux, Tom Glavine, Frank Thomas, Craig Biggio, and Mike Piazza. Adding the instant runoff votes rarely affects who is elected: Bagwell’s votes rise, but he still only reaches 75% about 15% of the time, and nobody else comes close.
Interestingly, now when I use Tango’s suggestion of drawing the line after 4 players, things get a little harder:
92 total votes. 69 needed for election. 10000 simulations.

Player	Mod Votes	Actual Votes	Elected	Above Votes
Greg Maddux	90.99	91	10000	87.99
Tom Glavine	88.98	89	10000	80.35
Frank Thomas	85.05	83	10000	59.84
Craig Biggio	77.45	74	10000	36.08
Mike Piazza	75.04	70	9905	27.69
Jeff Bagwell	65.81	61	1380	15.94
Tim Raines	59.62	55	0	11.44
Jack Morris	58.20	57	0	16.42
Barry Bonds	48.34	44	0	7.13
Roger Clemens	47.28	43	0	6.80
Curt Schilling	42.77	39	0	4.89
Mike Mussina	32.20	29	0	2.63
Edgar Martinez	24.75	22	0	1.78
Alan Trammell	24.56	22	0	1.77
Lee Smith	19.78	18	0	2.31
Fred McGriff	14.67	13	0	0.81
Larry Walker	12.61	11	0	0.76
Jeff Kent	12.49	11	0	0.64
Sammy Sosa	10.26	9	0	0.68
Mark McGwire	10.25	9	0	0.51
Rafael Palmeiro	6.89	6	0	0.46
Don Mattingly	2.30	2	0	0.07

Now, rather than having 5 locks as before (Piazza’s 70 known votes would guarantee him election in this 92-voter electorate), only 4 players always get in, and while Piazza usually does make it, he’s no longer a sure thing. Compared to 2013 and 2012, there are more candidates close to or above pace for induction, and with such broad agreement it seems that it doesn’t take as long for a given voter’s 4 picks to all be elected, at which point that voter’s ballot is no longer used to add votes to other players. So in this case the fixed level runoff of 4 doesn’t redistribute as many votes as using the overall ballot totals would, resulting in a somewhat reduced chance of reaching 75%.
So the public ballot sample (which is not representative of the full electorate, so please don’t think I’m expecting 5 players to be elected!) has shown a sharp increase in numbers of players per ballot.

Year	Ballots	Players per Ballot
2012	114	4.85
2013	168	6.52
2014	92	9.34

In 2012, the average ballot had fewer than 5 names, but now it’s more than 9. So the overall bar is effectively lower running my original model (assuming the bar would be after any of the voter’s choices) now than it was in 2013 or 2012.
Instant runoff voting looks like it would help accelerate getting players into the Hall of Fame, and using that process in years where there are not particularly strong candidates would de facto lower standards. Consider that Lee Smith very likely gets in using this model on 2012 data, and both Alan Trammell and Edgar Martinez sometimes make it, yet none of them come close in 2013 or 2014 as the ballot gets more crowded.
I do reiterate that the public ballots are a small, non-representative subset of the overall Hall of Fame electorate, but for 2012 and 2013, where there are actual vote percentages, I did use them instead of sample percentages in weighting whether to use a player to fill out a ballot, or deciding whether he goes above or below the line.It is an interesting variant on the voting process, and it might be good to apply it in years when the BBWAA does not elect anybody as an attempt to ensure at least one living player gets inducted each summer. I’m sure Cooperstown’s hotel managers would appreciate that!