New Leadership Elected

The Stanford Sports Analytics Club has held elections for its officer positions for the next year. The club will again be led by co-Presidents. Vihan Lahksman will continue to serve as co-President, and will be joined by the newly elected Scott Powers. Sandy Huang will continue in his previous position as Financial Officer. Serving as Blog Editor-in-Chief and Tech Officer will be Eli Shayer.

Thank you to the previous leadership of the club for bringing about a tremendously successful first year of existence for SSAC. A special thank you goes to outgoing co-President John Sears, who co-founded SSAC last year and leaves the club in a great position to continue into the future.

The Frictional Cost of a Call to the Bullpen

Photo from

Post by Eli Shayer and Scott Powers

It is well known that a starter’s performance tails off as he pitches deeper into a game. This drop off in results is attributed to facing the same batters multiple times, pitcher fatigue, and inconsistencies in mechanics. In this work, we examine reliever performance to see if there is an analogous effect.

Our study uses wOBA. a statistic developed by Tom Tango that measures the contribution of plate appearance results toward run creation, in units of runs. When assessing a pitcher’s quality, a low wOBA indicates a high performance by the pitcher, while a high wOBA shows the opposite. Expected wOBA is derived from the season wOBA of the batter. The figure below shows the difference between observed and expected wOBA for relievers, as a function of the number of batters faced, based on all batters faced by MLB relievers from 2000 to 2013.

For example, the value 1 along the x-axis corresponds to the first batter faced by relievers. The value 2 along the x-axis corresponds to the second batter faced by relievers, and so on. Because pitcher and batter handedness have a significant impact on the result, we separate the results into separate curves for each possible handedness pairing. The “All Handedness” curve is the unweighted average of the four other curves.


After the fourth or fifth batter faced (BF), results fluctuate greatly due to insufficient sample size, but all curves show the same pattern at the beginning: On average, the wOBA of the first BF is 10 wOBA points higher, relative to expectation, than the wOBA of the second BF. This difference of 10 wOBA points scales to a difference of about 0.37 runs per 9 innings (because the average number of batters faced per 9 innings is about 37).

Our proposed explanation of this frictional substitution cost is that pitchers require some feeling out of their pitches and throwing at full effort before being completely game-ready. Warm up pitches in the bullpen appear to not sufficiently prepare a reliever for appearing in the game, and they pay the price when facing their first batter.

What kind of relievers struggle most against the first batter faced?

While we account for batter skill by comparing results against expected results for the batter (and in doing so adjust for year), the above results do not account for pitcher ability. Pitchers who face more batters than average are over-represented, relatively, against the fifth BF, while pitchers with fewer BFs than average are relatively over-represented against the first batter faced.

To account for this source of bias, we define a reliever’s type based on the number of batters he faces in an average outing. The three categories were < 3.5 BF, 3.5 – 4 BF, and > 4 BF. These categories were derived from the distribution of average number of batters faced, which was centered at 3.5 – 4 BF, with long tails on either end. Using the same model as above, we made a similar graph for each category of reliever, included in the figure below.


Dividing the data to this granularity, we observe that the sample sizes have been reduced sufficiently to mask the signal with the noise. In none of the three graphs above is there a clear trend. However, one important observation is that the three groups of relievers do not have significantly different performance over all among the first five batters faced. So we have assuaged concerns that the observed first batter effect may be due to sampling bias.

How do relievers struggle against the first batter faced?

To try to understand exactly how reliever performance changes as they face more batters, we broke down the distribution of results for each number of batters faced. In the table that follows, we have found the percentage of plate appearances that end in each result.


The data in the table demonstrate that the mechanism for relievers performing worse against their first batter faced is a high level of power. The first batter of a reliever’s appearance hits fewer singles than typical, but makes up for it by hitting an above average proportion of doubles, triples, and home runs. Additionally, the peril of leaving a reliever in too long is clear when comparing the first few batters faced to the last few batters faced in the chart. In fact, the first batter effect is overcome by a reliever tiredness effect by the 7th batter, at which point reliever performance increasingly worsen, and is worse than their performance against the first batter.

What about the first batter faced in subsequent innings?

The final aspect of our analysis was looking at whether there is a first batter effect for each inning similar to the one we found for each appearance. Knowing that the first batter effect exists for the first inning we separated out that effect from a potential first batter of the inning effect. Thus this analysis looked exclusively at plate appearances pitched by relievers coming out of the dugout after pitching the final out of the previous inning. Pooling together all innings besides the first into number of batters faced results in the figure below.


The graph doesn’t show any notable patterns in the first several batters faced. There doesn’t appear to be an analogous first batter effect, and moreover the data shows an opposite result. There is an oddly consistent result in the fifth and sixth batters faced, which is a source of intrigue. Otherwise, the data doesn’t show an effect on the first batter of an inning, other than the first batter of the appearance as a whole.


We have shown that relievers struggle against the first batter they face, relative to expectation. Data were insufficient to identify which types of relievers suffered from this effect most, but we were able to identify that the reason for the increase wOBA of the first batter faced is an increase in power numbers. That is, the proportion of doubles, triples, and home runs against the first BF is higher than would otherwise be expected when relievers enter a ballgame.

Intuitively, these results make sense. A reliever who has just entered the game could not be described as being “in rhythm.” These results suggest that there is an increased risk of such a reliever throwing a mistake pitch, resulting in extra bases. Perhaps, on average, the time spent warming up in the bullpen is insufficient for a reliever to be “game ready.”

The frictional cost we observed is the equivalent of a difference of about 0.37 runs in ERA. So while much has been made of the value of using relievers, this effect is something that managers need to take into account when they are managing their bullpens.

Something that we did not explore is whether relievers struggle more against the first batter face when they have more or less forewarning that they will enter the game. This preparedness may be difficult to measure, but a possible surrogate would be an indicator of whether the reliever entered mid-inning. We leave this to future work.

Eli Shayer is an undeclared freshman from Anchorage, Alaska. He misses having snow available for cross country skiing.

Scott Powers is a PhD student in statistics and an analytics consultant to the Oakland Athletics. He plays catcher for the club baseball team and setter for the club volleyball team.

Contact Eli at eshayer ‘at’ and Scott at sspowers ‘at’

Examining MLB Postseason Cluster Luck: or, Why the Playoffs Might Be a Crapshoot

Photo from

Post by Vihan Lakshman

What role does luck play in baseball success? As one of the pioneering sports in quantitative analysis, our national past time is now understood—in many respects—as a finely tuned game of numbers. But does that tell the whole story?

Many prominent baseball figures, including Billy Beane, have described the MLB playoffs as a “crapshoot,” a roll of the dice that throws regular season success out the window. As Beane puts it, the teams who make the playoffs undoubtedly deserve to be there following a marathon 162 game regular season, but pure luck might be ultimate factor behind who finally ends up hoisting the World Series trophy.

To explore this idea of postseason luck in more detail, we can examine the “cluster luck” of teams in the regular season and the postseason. First coined by Joe Peta in his book Trading Bases, cluster luck provides a numerical measure of a team’s fortune in stringing together hits.

Jonah Keri of Grantland explains the phenomenon of cluster luck with an example: “Say a team tallies nine singles in one game. If all of those singles occur in the same inning, the team would likely score seven runs; if each single occurs in a different inning, however, it’d likely mean a shutout.”

As a further example of very unfortunate cluster luck, consider this box-score from Baseball-Reference from a 2005 meeting between Minnesota and Kansas City where the Twins tied a 1969 MLB record for the most hits in a game without a run.


Thus, if we use cluster luck as a tool to measure the respective fortunes of MLB teams in the regular season and the postseason, we might be able to shed some light on whether the playoffs are indeed a crapshoot, or if there is, in fact, a correlation between regular season and post season cluster luck—suggesting that cluster luck may not be luck at all.

While the idea behind cluster luck may make intuitive sense, there is no clear-cut, standard method of calculating how well a team bunches hits together. In this analysis, I used the base-runs formula, a model of predicting scoring, and considered the most accurate sabermetric statistic for run estimation. For all playoff teams between the years 2007 and 2014, I calculated each club’s regular season and postseason luck by determining their predicted run totals from the base runs formula and subtracting that from the actual amount of runs scored. A negative number indicates that a team scored fewer runs than expected and is hence “unlucky” while a positive score denotes “good luck” and specifies how many runs a team exceeded our base-runs prediction.

In examining the World Series winner from 2007-2014, we see that the vast majority of teams enjoyed positive cluster luck in the postseason.


Perhaps what’s more surprising about this list is the overwhelming amount of negative cluster luck during the regular season, most notably on the part of the 2009 Yankees who finished at the bottom of MLB in regular season luck. This phenomenon can likely be explained by considering that teams who manage to win games in spite of bad luck might be the most talented. In addition, this table of World Series winners provides our first bit of evidence that there may not be a correlation between regular season and postseason cluster luck, affirming the theory of the playoffs as a crapshoot.

To test this idea in further detail, I conducted a simple linear regression examining postseason cluster luck versus regular season luck.


Under the null hypothesis that the true slope of our linear regression is 0, we use a two-sided t-test to obtain a p-value of 0.6201, which is greater than our significance level of 0.1. Therefore, we cannot reject our null hypothesis and cannot conclude anything further about the relationship between postseason and regular season luck.

In our regression, we obtained an R2 value of 0.003987, suggesting that regular season cluster luck explains virtually none of the variance in postseason luck.

Ultimately, we found no evidence of a relationship between a team’s luck in the regular season and in the playoffs, which is what one would expect if it were truly luck. Although we cannot conclude that no relationship exists, there might in fact be something to the intuitive notion that the playoffs are a crapshoot. Whether this news is comforting to perennial playoff disappointments like the A’s, I can’t say, but the idea that luck can play such a huge role in determining legacies in sports is a fascinating question and definitely deserving of further exploration.

Vihan Lakshman is a junior from Savannah, GA studying mathematics. He also writes about football for The Stanford Daily and broadcasts sports for KZSU student radio. In his free time, he loves playing intramural sports and hopelessly rooting for the Atlanta Falcons to return to the Super Bowl.

Contact Vihan at vihan ‘at’

The Importance of Having a High NBA Draft Pick

Photo from

Post by Konstantinos Balafas

On October 21st, the NBA board of governors voted against reforming the NBA’s draft lottery. A very good review of the proposed changes and potential ramifications can be found here but the overarching theme of the league’s proposal was limiting “tanking”. The board of governors ended up rejecting the proposal and, while the argument that was made was that the changes would hurt small-market teams, it indicates that there are NBA GMs and owners that are (or may be in the future) willing to embrace a losing ideology for the reward of a high draft pick. That brings us to the “million-dollar” question: Is tanking really worth it?


In an attempt to answer that question with numbers, names and simple analysis, we gathered data for the “most successful” players since 2000 (from Wikipedia) and of the teams’ Win/Loss percentages since 1985 (from – the year the lottery system came into effect. For the purposes of this article, the “most successful” players are those elected to All-NBA and All-Star teams, as well as the starters for teams that played in NBA Finals.

There are certain caveats to this analysis. As far as the players are concerned, traded picks, on draft night or otherwise, are not considered. So, for this analysis, Kobe Bryant is a Charlotte Hornets pick despite never playing a minute for them and Jeff Green, as the #5 pick in 2008, is not considered for helping Boston have the best single-year turnaround in league history. As far as the team performances are concerned, only the top pick of each team is considered in order to simplify the analysis. That means that any effect that Tristan Thompson (#4 pick, 2011) may have had for the Cleveland Cavaliers has been attributed to Kyrie Irving (#1 pick, 2011).


As a first-pass analysis, we plotted the histograms of the draft picks for the aforementioned player categories, which are shown below. The histograms show a concentration of draft picks in the 1-10 range, which reinforces the intuitive belief that “good players are generally drafted high”.


It is worth noting that no player drafted lower than 10 has made the first All-NBA team since 2000. So far, the pick distributions shown indicate that it is indeed important for a team to have high draft picks and therefore tanking may indeed be a viable strategy for lottery teams. However, a (very) good player does not a good team make, or Kevin Love would still be plying his trade in Minnesota.

For that reason, let us explore the picks of the players that have started at least one game in the NBA Finals over the past 14 years. Figure 2 shows these picks for the NBA Champions (left) and the NBA Runners-up (right).


Again, the vast majority of the players are drafted in the lottery (picks 1-14). Interestingly enough, with the exception of the 2007-2011 interval and the ’04 Pistons there has been no NBA Champion without a #1 pick. Even in the listed exceptions, these teams had multiple Top-10 picks. Still more indication that teams need lottery picks to contend for a title!


There is, however, an important parameter that has not been yet investigated. As the Miami Heat proved, the draft is not the only way to high draft picks and, subsequently, title rings. For that reason, Figure 3 shows the same histograms as Figure 1, only in this case different colors correspond to players that achieved the honors with the team that drafted them or a different one.


It generally seems that there is no clear trend in the distributions of draft picks with the drafting or with a different team. Top picks tend to stay (or be more successful) with the team that drafted them, while starting five in the NBA Finals tend to be assembled in ways other than the draft.


So far, then, even if there is no clear answer on whether a team is justified in tanking, quite a bit of the data seem to point that way. On the other hand we’ve looked at All-NBA teams, All-Star teams and NBA finalists. That can be a tall order for a young kid that has just been drafted (unless your name is Tim Duncan, but more on that later). It is reasonable then to investigate the more short-term effect of draft picks.


Generally, if a high draft pick were to be strongly correlated with success, we’d expect teams with a high draft pick to exhibit a significant improvement over the next year and the points in the top part (teams with a high draft pick) of Figure 4 would be clustered towards the right of the figure (large difference in W/L percentage), which is clearly not the case.

Maybe then, one year is too short of a time for a rookie to prove his worth? To control for that, we looked at the progression of win/loss percentage over four years after a high draft pick. The four-year window was selected since that is also the length of a rookie contract. Figure 5 shows the league average of the difference in win/loss percentage against the number of years since the team had a particular lottery pick in the draft.


Based on the previous figure, it can be argued that a team will consistently improve over the four years after a lottery pick. Of course, there are many other factors that play a part, such as other roster moves, coaching changes, new draft picks etc., as well as the fact that this is the league average. Still, it is hard to make a strong case against tanking.

Does that, then, mean that a couple of draft picks can turn a franchise around? Figure 6 shows a grid of teams and seasons. A blue square indicates that a particular team had a lottery pick at a particular year and a larger square corresponds to a higher pick.


It can be seen that lottery picks come in waves. It takes more than a few years for a team to accumulate enough talent (or assets) to go from lottery team to playoff contender. Once the team goes through that breakthrough, though, there’s a good chance it will stay that way for at least a few years.


So, we saw that once a team has stockpiled enough high draft picks, it can break through the cycle of mediocrity and the Durant-Westbrook-led Thunder are living proof of that. Can that, though, lead a team to glory? The following figure shows the number of years since the last lottery pick for the NBA Champions since 1985 and, by the looks of it, it usually takes 4-6 years since the last lottery pick to win a championship. So, not an immediate turnaround, but well within the realm of possibility that the team won the Larry O’Brien trophy thanks to its lottery picks.


That is especially true for the case of one Timothy Theodore Duncan, who, as the last lottery pick of the San Antonio Spurs, has led them into a state of perpetual championship contention, 5 rings and 0 lottery picks in the past 16 years. While the contribution of Duncan is undeniable, there’s also a lot to be said about the system that he was drafted in. From the existence of a Hall of Famer like David Robinson and a Hall of Fame caliber coach in Gregg Popovich to the scouting team that brought All-Stars like Tony Parker and Manu Ginobili with the 28th and 57th pick respectively.

It is also worth noting that in the two cases of quickest lottery-to-championship turnaround (one year between lottery and championship), the 2004 Pistons and the 2008 Celtics, neither draft pick contributed significantly to the team. Darko Milicic, the #2 pick in 2003 averaged 4.8 minutes in 32 games for the Pistons (1.8 minutes per game in 8 games in the playoffs), while Jeff Green, the #5 pick in 2007 was traded to the Seattle Supersonics. It could, however, be argued that Jeff Green did actually contribute to the Celtics’ championship season as he was part of the package that took Ray Allen to Boston.


The first, and easiest, conclusion to be made here is that high draft picks tend to be good players. Secondly, it can be seen that players of that caliber are absolutely necessary for a team to challenge for a championship. Not only that, but, on average, a lottery pick will result in an improvement in win/loss percentage. Maybe not necessarily right away but at least within the lifespan of the rookie deal of said lottery pick. On the other hand, it is also demonstrated that it takes multiple high draft picks for a team to become a playoff contender, and that’s what it all boils down to. If a team is willing to suffer several years of mediocrity (to put it mildly) and accumulate a significant amount of talent through the draft, chances are that they will become a playoff (or even championship) contender. Like everything else, tanking takes commitment, but also has its rewards.

Konstantinos Balafas is finishing up his PhD on detecting damage from earthquakes. He grew up watching soccer and basketball and loves Steve Nash, Paolo Maldini and Bill Self.

Contact Konstantinos at balafas ‘at’

Why We Love Sports Analytics and Richard Sherman

Photo by John Todd via The Stanford Daily

I’m the best corner in the game. When you try me with a sorry receiver like Crabtree, that’s the result you’re going to get.” – Richard Sherman

The Stanford Sports Analytics Club loves Richard Sherman. He’s famous for his on-the-field play and his legendary trash talk, but he’s not always recognized for his work off-the-field as a student of the game. In an interview with NBC Sports, Sherman said, “My tape study and my meticulous attention to detail are what make me a good ball player.”

Like Sherman, the Stanford Sports Analytics Club strives to uncover competitive advantages through a detailed, analytical approach to sports. We believe this approach offers players and teams a more objective way of assessing their strengths and correcting their weaknesses. Becoming a better player or team requires accurately understanding strengths and weaknesses.

Starting this year the Stanford Sports Analytics Club will be maintaining a robust blog presence featuring different projects being worked on within the club. On campus we will be offering weekly workshops to help students develop their quantitative analysis skills. We will also be hosting expert guest speakers to provide a deeper understanding of how sports analytics is actually practiced. Last year we hosted Philadelphia 76ers GM Sam Hinkie.

Our purpose in starting this club is to ultimately build a strong sports analytics community here at Stanford. It seems only natural that an environment with a top athletics program as well as an excellent engineering school be a fertile ground for sports research. In the process of achieving this goal, our club’s focus will be on the following two main objectives. First, we will connect students with similar interests by facilitating collaboration on projects. Second, we will provide the resources students need to get their projects off the ground. This blog will be a platform for promoting their work.

Overall though, we just really want to learn more about the sports we love. Is Sherman really the best corner in the game? Is Crabtree really a sorry receiver? Let’s be clear. We don’t have all the answers to such pressing questions, but we’re eager to try to figure them out.