There was a discussion at Tom Tango’s blog which led to an interesting question: what’s the difference in effectiveness between a replacement-level starter and a replacement-level reliever?
To answer this, you’d need to have some way of estimating replacement level for starters and relievers. Since I have a database with boxscore data since 2010, I set out to do exactly that. First, I compiled player statistics by role, either starter or reliever (the same player might appear in both roles), and then I sorted players by some effectiveness statistic (runs per 9 innings, ERA, FIP, whatever). I bucketed players in groups – the top 10%, next 10%, and so on, and took the average effectiveness statistic for each group.
I figured the bottom 10% effectiveness should be a good proxy for replacement level: if you’re performing in the bottom 10%, you’re replaceable. Sounds good. Here’s a table from 2013 data using FIP:
Bucket | # Start | FIP | # Relief | FIP | Diff | IP Start | IP Relief |
---|---|---|---|---|---|---|---|
0 | 30 | 2.688 | 53 | 1.705 | 0.983 | 2928.34 | 919.67 |
1 | 30 | 3.313 | 52 | 2.506 | 0.807 | 4933.33 | 1967.33 |
2 | 29 | 3.584 | 53 | 2.962 | 0.622 | 4331.67 | 1959.68 |
3 | 30 | 3.890 | 52 | 3.251 | 0.639 | 3424.66 | 2416.67 |
4 | 29 | 4.090 | 53 | 3.589 | 0.501 | 3641.68 | 2464.00 |
5 | 30 | 4.359 | 52 | 3.924 | 0.435 | 3153.33 | 1644.34 |
6 | 30 | 4.629 | 53 | 4.280 | 0.349 | 2924.67 | 1473.32 |
7 | 29 | 4.999 | 52 | 4.973 | 0.026 | 1944.33 | 1245.66 |
8 | 30 | 5.691 | 53 | 6.221 | -0.531 | 1058.33 | 659.00 |
9 | 29 | 7.635 | 52 | 9.597 | -1.963 | 361.00 | 233.33 |
Immediately it looks like small sample size here is a big problem: the 52 relievers in the worst bucket average under 5 IP each, and the 29 starters average just about 12 IP. I tinkered for a while with adding arbitrary innings limits, like, say, 100 IP as a starter, and 30 IP as a reliever, ignoring players that don’t reach these totals. That gave more stable looking numbers, but the difference was greatly affected by where I might draw the line, and that was a purely arbitrary choice.
When I model fantasy sports pricing, I use the roster sizes for determining replacement level for a given league, so I thought I’d try that here. Each of the 30 teams typically uses 5 starters and carries a bullpen of at least 6 relievers. So I changed my filter to look at only the top 150 starters and 180 relievers by innings pitched. That led to this table:
Bucket | # Start | FIP | # Relief | FIP | Diff | IP Start | IP Relief |
---|---|---|---|---|---|---|---|
0 | 15 | 2.714 | 18 | 1.985 | 0.729 | 2716.67 | 1092.00 |
1 | 15 | 3.260 | 18 | 2.632 | 0.628 | 2846.00 | 1045.00 |
2 | 15 | 3.452 | 18 | 2.884 | 0.568 | 2830.00 | 1124.67 |
3 | 15 | 3.596 | 18 | 3.087 | 0.509 | 2425.00 | 1121.67 |
4 | 15 | 3.833 | 18 | 3.251 | 0.582 | 2515.99 | 1181.00 |
5 | 15 | 4.010 | 18 | 3.437 | 0.573 | 2454.00 | 1057.00 |
6 | 15 | 4.136 | 18 | 3.612 | 0.524 | 2181.01 | 1025.67 |
7 | 15 | 4.354 | 18 | 3.808 | 0.546 | 2185.33 | 1088.67 |
8 | 15 | 4.555 | 18 | 4.133 | 0.422 | 2194.00 | 911.66 |
9 | 15 | 5.010 | 18 | 4.871 | 0.139 | 1954.00 | 888.66 |
That looks much better. The cutoff was 79 IP for a starter, and 36.67 IP for a reliever.
Here’s the same algorithm, just computing/bucketing by RA9 instead of FIP:
Bucket | # Start | RA9 | # Relief | RA9 | Diff | IP Start | IP Relief |
---|---|---|---|---|---|---|---|
0 | 15 | 2.718 | 18 | 1.684 | 1.035 | 2579.00 | 1122.34 |
1 | 15 | 3.292 | 18 | 2.374 | 0.918 | 2706.67 | 1099.34 |
2 | 15 | 3.530 | 18 | 2.732 | 0.798 | 2605.34 | 1060.66 |
3 | 15 | 3.737 | 18 | 3.024 | 0.713 | 2620.34 | 1226.34 |
4 | 15 | 3.936 | 18 | 3.322 | 0.614 | 2641.33 | 1159.66 |
5 | 15 | 4.187 | 18 | 3.625 | 0.562 | 2530.00 | 1003.00 |
6 | 15 | 4.419 | 18 | 3.976 | 0.443 | 2220.00 | 1025.33 |
7 | 15 | 4.806 | 18 | 4.315 | 0.491 | 2236.00 | 1038.67 |
8 | 15 | 5.296 | 18 | 4.640 | 0.656 | 2200.67 | 975.67 |
9 | 15 | 6.103 | 18 | 5.509 | 0.594 | 1962.66 | 825.01 |
In the discussion on Tango’s site I had speculated that the difference between elite relievers and elite starters was likely greater than the overall average, and these tables support that view: the top bucket showed the widest gap in both FIP and RA9.
The top buckets aren’t relevant to the question of replacement level, but it is good to see that in general teams give more innings to pitchers who are performing better. That the best bucket doesn’t have the highest IP is also expected, as being the very best in performance rate requires some good luck as well as skill, and it’s easier to post very good rate stats in smaller slices of playing time.
Next I’ll track just the bottom bucket, but over the past 4 years:
Bucket | # Start | FIP | # Relief | FIP | Diff | IP Start | IP Relief |
---|---|---|---|---|---|---|---|
2010 | 15 | 5.375 | 18 | 5.392 | -0.016 | 1981.67 | 906.33 |
2011 | 15 | 5.095 | 18 | 5.131 | -0.036 | 2113.33 | 850.67 |
2012 | 15 | 5.411 | 18 | 5.031 | 0.380 | 1858.67 | 893.33 |
2013 | 15 | 5.010 | 18 | 4.871 | 0.139 | 1954.00 | 888.66 |
And the same years using RA9:
Year | # Start | RA9 | # Relief | RA9 | Diff | IP Start | IP Relief |
---|---|---|---|---|---|---|---|
2010 | 15 | 6.195 | 18 | 6.279 | -0.084 | 2071.67 | 926.00 |
2011 | 15 | 5.945 | 18 | 5.898 | 0.047 | 2036.00 | 824.00 |
2012 | 15 | 6.320 | 18 | 6.075 | 0.245 | 1744.33 | 960.00 |
2013 | 15 | 6.103 | 18 | 5.509 | 0.594 | 1962.66 | 825.01 |
The minimum innings threshold varied from 79 to 85.33 for starters, and from 34.33 to 37 for relievers, as each year I looked at just the top 180 relievers and 150 starters by IP. It’s interesting to see that in both FIP and RA9, there was basically no difference in bottom percentile performance in 2010 and 2011 between starters and relievers. Since then, however, relievers have done a little better than starters, which is what I’d have expected.
This analysis is sensitive to where you draw the lines to exclude players for low playing time. Increasing the numbers in the pool (either by straight numerical count or by lowering an innings threshold) will make that group of pitchers look worse relative to the other group.