One Player, Two Kinds of Stats: Handling Shohei Ohtani

This year the Angels signed Shohei Ohtani, the young Japanese star who played both as a pitcher and outfielder for the Hokkaido Nippon Ham Fighters of the Japanese Pacific League. As in Japan, Ohtani is expected to continue to play extensively both as a pitcher and hitter.

Some players have played significantly at the major league level in both capacities in different seasons (most recently Rick Ankiel, and more famously Babe Ruth), but in the modern game it is unheard of for one person to contribute both as a batter and a pitcher in the same season. Until, probably, 2018, when Ohtani seems poised to do so.

This raises the question of how to handle Ohtani for fantasy purposes.

Yahoo! is handling Ohtani by turning him into two different players, a batter and a pitcher, and they’re allowing different fantasy teams to own the different stats of Ohtani. While I can see how that might be an easier technical fix in some platforms, it’s ugly, and it loses the benefit of flexibility a two-way player gives a real MLB team. Allowing different fantasy teams to own Ohtani’s batting and pitching stats is simply wrong, and making a team use two roster spots to be able to get all of Ohtani’s stats does’t match the true flexibility the Angels will have by having a single player capable of contributing both in the batters’ box and on the pitching mound.

CBS and ESPN are a little better: they let you put Ohtani either at P or in a batting slot, but they only count the pitching stats when he is in your lineup as a pitcher, and just the batting stats when he’s in your lineup as a batter. While that’s not terrible for a daily transaction league (you can swtich Ohtani in your lineup based on where he’ll play for the Angels to get most of his stats), you’ll still lose out on Ohtani’s offensive stats on days he’s pitching in an NL park and has to bat for himself. If you’re in a league with weekly transactions, though, this solution forces you to decide whether you want Ohtani’s batting or pitching numbers for an entire week.

RotoValue lets you have the best of both worlds: by default, it will count all Ohtani’s stats, no matter where you play him.

Because RotoValue already supports an option to count pitcher’s batting stats (and/or batters’ pitching stats), it was actually quite easy to enhance it to support counting both types of statistics for a player who qualifies at both an offensive and pitching position, while only counting the primary statistics for players who just qualify at one type. So by default, RotoValue will count both batting and pitching stats for Ohtani whenever he plays, but will ignore other pitchers’ batting statistics and other batters’ pitching statistics. Strictly speaking, this is not special treatment for Ohtani, but would apply to any player who qualifies both as a pitcher and a batter. It’s just that for now, Ohtani is likely to be the only one to do so: for now RotoValue is listing him as SP/DH, although it’s possible we may give him OF eligibility instead (or in addition to) DH eligibility.

Suppose you really want Ohtani to count only as a pitcher. You can simply set his custom position to be SP only, and his batting statistics will no longer count. Or you could make him eligible at DH or OF only, and then only his batting statistics would count. Such changes would take effect immediately and retroactively: reload a standings or team stats page after changing the setting, and the full season totals will reflect the new settings.

So – how do you change a player’s position?

If you have administrator rights for your league, on the Player Detail page there’s a menu item under Settings: “Set custom position for…”.

PlayerDetail for Shohei Ohtani

That menu option gives you a page where you can override the position for a player:

Custom Position for Shohei Ohtani

If you simply want all of Ohtani’s stats to count, you won’t have to make any of these changes, but if you want him to be a pitcher only (or, less likely, a batter only), you’d need to override his default position. So RotoValue gives you the choice to handle Ohtani how you think it makes better sense for your league.

Alternatively, if you want to count all pitchers’ batting (and/or batters’ pitching) stats, you can choose those options directly from the bottom of the main league Settings page:

Update League settings

Posted in Fantasy Strategy, Major League Baseball | Comments Off on One Player, Two Kinds of Stats: Handling Shohei Ohtani

Fighting the Last War

This week’s Riddler at FiveThirtyEight reruns a puzzle initially run back in February. The game is a “war” between two warlords fighting over 10 castles. Each warlord has 100 soldiers, and the 10 castles are worth from 1 to 10 points each. If you send more soldiers to a given castle than your opponent, you win it and its point value; if you both send exactly the same number of soldiers, the point value for the castle is shared between the two. The winner of the war is the one with more total points, and so ties are possible overall only if you’ve shared at least one castle with an odd point value. The goal of the first Riddler challenge was to have the most success against all other entries in the contest.

They’ve made the entries from the first contest public, and so that gives one a chance to see not only the winners, but all the entrants, and to play around with the data. I quickly hacked up a perl script to take their data and compute the results of the first contest. That ran pretty slowly (doing head-to-head battles for all the nearly 1400 entrants took about 12 seconds on my laptop), but accurately. I then rewrote the code in C, and saw the same thing run in about a tenth of a second, two orders of magnitude faster.

This made me wonder whether brute force might be useful to attack the problem, so first I wrote a C program to count all the possible allocations of the 100 armies across 10 castles. This took, uh, a while to run – about 8 hours – but I got my answer – a little over 4.25 trillion combinations.

That didn’t sound too good, but I quickly modified code so I could start trying combinations in order against the prior entries, and print out any one that did as well or better than the best combination I now tried. Last Sunday night I kicked that program off on my laptop. A week later, the program is still running, and has only gone through about 6 billion combinations, but it did find a few which did better than the best performing entry last time, so I’m entering the best of those today.

I wanted to see if I could speed things up, and one way to do that was to test, not against nearly 1400 entries, but a subset. I figured that people might look at which ones did well last time, and try to do something similar, so my subset was the 179 entries that, by my calculations, won 70% or more of their head-to-head matchups in the prior contest. I found many combinations that beat all 179 of those, so I then tried to compare lots of those against the larger dataset. So far, that has not given me a better result against the whole set from the last contest, but I’m entering one of those also, on the theory that this week’s entries will tend to look more like the very good ones from the last time than the whole set.

So my first entry is:

0 0 2 2 11 21 3 31 26 4

I’ll update this post later with links to the code used to generate these.

Update Wenesday 31 May 2017:

I’ve created a github repository for the code I was using here:

https://github.com/GeoffBuchan/Riddler/tree/master/Castle

Also, I realized that when I first built the C programs, I did not add any optimization switches. The count.c program which took several hours to run before runs in under 10 minutes on my laptop now that I’ve added -O2 optimization with gcc. So if I’d done this in the first place, I’d have been able to search a lot further!

Posted in Riddler, Software | Comments Off on Fighting the Last War

#FixTheWin

A few years back Brian Kenney introduced the hashtag #KillTheWin, which still lives on. Baseball fans point out egregious cases of a pitcher getting a win in a game despite pitching poorly. While the win might have been a useful metric for pitchers in the dead-ball era, when starters completed more than half their starts in aggregate, as baseball has evolved and bullpens have risen in prominence, the definition certainly has its flaws.

But it also has its history. Baseball fans still marvel at Cy Young’s total of 511, a seemingly impossible goal. While it’s become less common in the 21st century, a 20-game winner was once a standard of excellence among pitchers. Killing the win loses touch with that history. So instead, perhaps, we should consider trying improve it instead. In 2014, Tom Tango proposed a complete redefinition, a simple points system that would usually determine the pitcher most deserving of a win (or loss) in a given game. I wrote about the idea, and created a page to compute wins and losses under that proposal. Recently Tango e-mailed me with a different idea: suppose we keep the basic structure of the current rule, but instead of giving the win to the pitcher who recorded the last out before a team takes its lead for the final time, we give it to the first pitcher whose line for the game closes in a position to win the game, so long as that pitcher’s team still wins.

Continue reading

Posted in Sabermetrics | Comments Off on #FixTheWin

Comparing Projected HR leaders to actual

Tom Tango asked an interesting question on Twitter yesterday:
TangoForecastHR

The odds of the projected HR leader actually leading the league is an interesting question. I’ve been doing projections since 2011, so I thought I’d sweep my database for the RotoValue projections and see what that history was. That gives me just five years, but it turns out my projection model did correctly name the MLB home run leader once in those five years, or 20% of the time. Chris Davis hit 47 HR in 2015, leading MLB, while my model projected him to hit 35 HR. Note that when you’re leading the league, you’re not only very likely beating your own projection, you’re also probably beating all the projections, because the projected totals can be considered a weighted average of all possible outcomes for that player, and the possibility of a bad year or injury will pull that average down from what a peak player will produce when healthy. Also it’s not unusual for the league leader to be a player having a breakout year, well above what his past performance suggested.

Last year, Mark Trumbo’s career-best 47 HR topped MLB, despite my model projecting him for just 21.3 HR, the 46th best total. My projected 2016 leader was Chris Davis again, now projected to hit 37.9, and he actually slightly edged that out, with 38 HR. But I was not projecting the overall home run surge, and Davis’s 38 HR ranked only 12th best.

The actual MLB HR leader has regularly surprised my projections model. Only one other year, 2012, when Miguel Cabrera’s 44 HR led, was the actual leader among my preseason projected leaders (my model projected 34.5 HR for Cabrera, the 5th best projection). In the other years, the leader was projected 36th (Nelson Cruz moving to Baltimore in 2014), 90th (Chris Davis in 2013) and 73rd (Jose Bautista in 2011) by my model.

Over the 6 years, of the 60 players projected to finish in the top 10, 22 of them did so. Also, 24 of the 60 equaled or bettered their projected total, while 36 failed to reach the projected value.

Below the jump I’ve put tables for each year showing players projected to be in the top 10 in HR in my model, along with any players who actually finished in the top 10, along with their projected values.

Continue reading

Posted in Major League Baseball, Projections, Sabermetrics | Comments Off on Comparing Projected HR leaders to actual

Count von Count Riddler

I periodically attempt the FiveThirtyEight Riddler, edited by Oliver Roeder. This week he’s actually presenting two, a shorter “Riddler Express”, and a more time consuming “Riddler Classic”.

The Sesame Street character Count von Count likes to, well, count, and he now has his own twitter feed! For those who don’t recall the character from the show, he’s a purple muppet dressed in black, as a sort of kindly Dracula, who would count up in an eastern European accent.

Well, the twitter feed is simply the Count counting, albeit in words written out describing the number. As I type this, his latest tweet is “Eight Hundred Seventy Nine!”

So the Riddler Express is to find out how high can Count von Count count on twitter in this way before hitting its 140 character limit. Because he is enthusiastic, all his tweets must end with an exclamation mark.

Continue reading

Posted in Riddler | Comments Off on Count von Count Riddler

Pirate Riddler

This week’s FiveThirtyEight Riddler is a logic puzzle. Assume 10 “Perfectly Rational Pirate Logicians”. The pirates have found 10 gold pieces, and the puzzle is to figure out how they will allocate the loot among themselves. There are several constraints.

First, the pirates themselves are ranked in a strict hierarchy, with the Captain at the top, and then the others ordered beneath them, so if for any reason the Captain is no longer able to fulfill his duties, the second in command becomes new Captain, and everyone else moves up one step on the hierarchy.

Second, the Pirates practice a form of democracy. While the Captain, due to rank, gets to propose an allocation, the whole crew, with one vote per pirate, votes on the proposal. If that proposal gets half or more of the vote, it carries, but if more than half of the pirates vote against it, they will mutiny, killing the old Captain, and leaving it to the new captain to propose an allocation.

So our perfectly rational pirate logicians have three constraints in determining how they will vote on a proposed allocation:

  1. They value life above all, so they will not vote in a way to put their own lives at risk of mutiny if they can at all avoid doing so.
  2. They are greedy, so as long as their life is not at stake, they will vote for something which maximizes their own personal share of the loot.
  3. They are bloodthirsty, so if they have two choices in which they’d remain alive but get the same booty, they will prefer to mutiny and kill the captain.

Given the above constraints, how will the 10 pirates allocate their 10 newly found gold pieces?

Continue reading

Posted in Riddler | Comments Off on Pirate Riddler

FiveThirtyEight Baseball Division Champs Puzzle

Update: I’ve added a link to the Perl progam I used to do these simulations.

Oliver Roeder presents a weekly puzzler on FiveThirtyEight, and this week it was a baseball-themed puzzle. Assume a sport (say, “baseball”) in which each team plays 162 games in a season. Also assume a “division” (e.g. the “AL East”) containing 5 teams, each of exactly equal skill. In other words, each team has exactly a 50% chance of winning any given game. The puzzle is to compute the expected value of wins for the division-winner.

Interestingly, the problem is open to interpretation, and the result I get depends on what assumptions I make. My initial assumption was to treat each game for each team as a simple coin-flip. I ran 100,000 simulated “seasons”, getting an average of 88.4 wins for the division leader. But games have two teams, and who the opponent is could matter to this problem. In an extreme situation, the “coin flip” model could result in winning the division with 0 wins, in the highly improbable event that each team lost every game.

Since I happened to have the 2016 MLB schedule available, I used it for each game. This adds the constraint that in games involving two teams in the same division, one team winning implies its opponent must lose. Doing this, I got an average of 88.8 wins for the division winner.

The third variant I tested produced the toughest constraint: I assumed the five teams only played games among themselves (at least 40 against each opponent). Thus every win for one team always means a loss for its opponent. This gave me an average of 89.3 wins for the division winner.

Continue reading

Posted in Major League Baseball, Riddler, Sabermetrics | Comments Off on FiveThirtyEight Baseball Division Champs Puzzle

Comparing 2014 Projections – ERA and WHIP

Yesterday I ran comparisons of several projections systems for an all-inclusive batting statistic, wOBA. Today I’m running the same tests, computing root mean square error (RMSE) and mean absolute error (MAE), for two commonly used fantasy statistics, ERA and WHIP. These tests are bias-adjusted, so what matters is a player’s ERA or WHIP relative to the overall average of that system, compared with the player’s actual statistic relative to the actual overall average. The lower the RMSE or MAE, the better a projection system predicted the actual data.

I have data for these projection models:

  • AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
  • Bayesball – Projections from Jonathan Adams.
  • CAIRO – from S B of the Replacement Level Yankees Weblog.
  • CBS Projections from CBS Sportsline.
  • Davenport Clay Davenport’s projections.
  • ESPN Projections from ESPN.
  • Fans Fans’ projections from Fangraphs.com.
  • Larson Will Larson’s projections.
  • Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
  • MORPS – A projection model by Tim Oberschlake.
  • Rosenheck Projections by Dan Rosenheck.
  • Oliver – Brian Cartwright’s projection model.
  • Steamer – Projections by Jared Cross, Dash Davidson, and Peter Rosenbloom.
  • Steamer/Razzball – Steamer rate projections, but playing time projections from Rudy Gamble of Razzball.com.
  • RotoValue – my current model, based largely on Marcel, but with adjustments forpitching decision stats and assuming no pitcher skill in BABIP.
  • RV Pre-Australia – The RotoValue projections taken just before the first Australia games last year. Before the rest of the regular season I continued to tweak projections slightly.
  • ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.

First up is ERA, comparing the 75 pitchers projected by all systems: Continue reading

Posted in Major League Baseball, Projections, Sabermetrics | Comments Off on Comparing 2014 Projections – ERA and WHIP

Comparing 2014 Projections – wOBA

In the past three years I’ve done reviews of baseball projections systems with actual data for those systems for which I could get data. Will Larson maintains a valuable site of projections from many different sources, and most of the sources I’m comparing are from that.

As in the past, I’m computing root mean square error (RMSE) and mean absolute error (MAE) for each source compared to actual data. For these tests, I am doing a bias adjustment, so the errors are relative to the average of a source. I care more about how a system projects players relative to its own projected averages than about how well it projlects the overall league average.

I have data from these systems:

  • AggPro – A projection aggregation method from Ross J. Gore, Cameron T. Snapp, and Timothy Highley.
  • Bayesball – Projections from Jonathan Adams.
  • CAIRO – from S B of the Replacement Level Yankees Weblog.
  • CBS Projections from CBS Sportsline.
  • Davenport Clay Davenport’s projections.
  • ESPN Projections from ESPN.
  • Fans Fans’ projections from Fangraphs.com.
  • Larson Will Larson’s projections.
  • Marcel – the basic projection model from Tom Tango, coauthor of The Book. This year I’m using Marcel numbers generated by Jeff Zimmerman, using Jeff Sackmann’s Python code.
  • MORPS – A projection model by Tim Oberschlake.
  • Rosenheck Projections by Dan Rosenheck.
  • Oliver – Brian Cartwright’s projection model.
  • Steamer – Projections by Jared Cross, Dash Davidson, and Peter Rosenbloom.
  • Steamer/Razzball – Steamer rate projections, but playing time projections from Rudy Gamble of Razzball.com.
  • RotoValue – my current model, based largely on Marcel, but with adjustments for pitching decision stats and assuming no pitcher skill in BABIP.
  • RV Pre-Australia – The RotoValue projections taken just before the first Australia games last year. Before the rest of the regular season I continued to tweak projections slightly.
  • ZiPS – projections from Dan Szymborski of Baseball Think Factory and ESPN.

In addition, I’ve computed a source “All Consensus”, which is  a simple average of each of the above (ignoring a source if it doesn’t project some particular category).

Not all the models had enough data to compute wOBA, so the tables (below the jump) only include those sources which do. The other sources do affect the All Consensus values for those stats where they do have data.

Continue reading

Posted in Major League Baseball, Projections, Sabermetrics | 2 Comments

RV Current for NBA

Similar to what I’ve done with baseball, I’m now running new projections daily for NBA players under the name RV Current. These projections add current year data into the model, increasing the weight given to the current season as more games are played.

This early in the season, the numbers aren’t much different from my preseason projections. But RV Current will continue to adjust to changing factors and on-court play, whereas the preseason projections just stay the same.

I should add that one other feature of player search pages: when showing projections for some mid-season future date range, they now automatically prorate projections based on known injuries. The site will try to determine a target return date from injury reports, and if a player isn’t expected back soon, his number of games will be reduced by the number he’s expected to miss. The injury reports page now also shows a Target Return? date, which, when present, will result in that players’s stats being scaled down when shown in a Search page. For a PlayerDetail page, the projections will simply show an expected full year projection for a player (which will be for much fewer than 82 games if a player has been especially injury prone in the past).

Projections are always fuzzy, but by incorporating newer data into daily projections, and taking known injuries into account when searching for player data, I’m trying to make them a little bit clearer.

 

 

Posted in NBA Basketball, Projections | Comments Off on RV Current for NBA