Economists Think MLB Pitchers Are Weird (Probably)

Silas Morsink (smorsink@stanford.edu)

A big thanks to Baseball Savant and Bill Petti for data provision and acquisition help.

You don’t need a background in economics to be familiar with the relationship between risk and reward. In life, riskier propositions are usually less attractive than their safer counterparts. Suppose you’ve come down with a cold: you’ll probably opt for your trusted DayQuil instead of taking a flyer on an untested remedy. But we don’t always opt for the safer option. Suppose I hand you a coin. If you choose to flip it, you’ll get $5 if it lands on heads, and $1000 if it lands on tails. If you choose not to flip at all, I’ll give you $10. Sure, the guaranteed $10 payout is “safer.” But I’m pretty sure you’ll flip the coin.

If faced with a risky option that has little reward (using the untested cold medicine), we’ll prefer the safer option with decent reward. But, we tend to like a risky option when the reward is high enough (flipping the coin).

What about when the reward on a safe option and a risky option is the same? Suppose we play the coin flip game again, but this time with different values. If you flip heads, you pay me $1000. If you flip tails, I pay you $1020. If you choose not to flip (the safe option), I’ll pay you $10. Either way, the expected value of the deal is $10. But, when expected rewards are the same, us risk averse humans tend to choose the safer option.

Economists formalize this relationship between risk and reward by analyzing the propositions that people tend to take on. There are lots of levels of complexity here, but we only need to focus on the most basic and most important conclusion: higher risk = higher reward. People tend to only take on higher risk propositions if their reward is higher. People can be convinced to take on lower reward propositions if the risk is lower.

If this doesn’t immediately make sense, consider the alternative. What if higher risk = lower reward? Hey: let’s play the coin flip game one more time. If you flip heads, you owe me $100. If you flip tails, I’ll give you $1. If you don’t flip at all, I’ll give you $5. You’d be crazy to flip the coin: it’s both riskier and lower reward.

Why This is Appearing at a Sports Analytics Blog

At this point, you would be justified in wondering what the hell this has to do with baseball. Well, we’re going to play the coin flip game with a pitcher. Except instead of choosing whether or not to flip a coin, he has to choose where to throw the ball.

Similarly to the coin flip game, there are riskier options and less risky pitch options. A pitch low and away? Low risk: the batter will almost certainly take the pitch for a ball; if they do swing, they are unlikely to make good contact. A pitch high in the middle of the strike zone? High risk: the batter might well swing and miss, but they also might barrel the ball over the fence. However, all the coin-flipping and medicine-choosing was meant to drive home the central lesson of risk and reward:  low risk is typically associated with low reward, and high risk is typically associated with high reward.

By this logic, the low risk pitch should have low reward, and the high risk pitch should have higher reward. Again, consider a world where this isn’t the case: where low risk pitches have high rewards, and high risk pitches have low rewards. Pitchers should virtually never throw the high risk, low reward pitches! It’s just like the last coin flip game: to throw the high risk, low reward pitch is to take the cold medicine that is both risky and ineffective.

What’s going on in the real world with MLB pitchers? Something weird. Here’s the spoiler: there’s often a negative relationship between risk and reward. There are some high risk, low reward pitch locations, and then there are some low risk, high reward pitch locations. And pitchers throw high risk, low reward pitches! They’re choosing the untested cold medicine: opting for low reward, high risk propositions. What’s going on here?

The Data

The idea of this project is to isolate the effect of pitch location. For various pitch locations, we want to obtain the distribution of outcomes associated with pitches to that location. If the economic theory holds (higher risk = higher reward), then the pitch locations with a higher expected outcome (reward) should also have a higher variance in outcomes (risk).

But first, we must address some complicating factors. Complicating factors (1) are things that affect the outcomes of pitches to a certain location and (2) differ across locations. Complicating factors may lead us to incorrect conclusions about pitch locations. For example, suppose that there are pitches that (1) are (for some reason other than location) associated with better outcomes, and (2) are more likely to be thrown to certain locations. Then, the locations to which these pitches are thrown would appear to have better outcomes due to the reason other than location.

The two complicating factors that I identified were pitch type and count. Pitch types (1) are associated with different outcomes: even if two pitches of different type are thrown to the same location, they’ll likely have different outcomes (owing to their movement, spin, etc: the pitch type). Plus, pitch types (2) differ across locations: for example, pitches high in the zone are disproportionately fastballs.

Count also fulfills the two criteria of a complicating factor. Count (1) affects the outcome of pitches to certain locations: for example, with two strikes, pitches to a given location tend to generate more swinging strikes and more ball-in-play outs. Also count (2) differs across location: unsurprisingly, pitchers are much less likely to throw a pitch out of the zone when there are three balls than when there are two strikes.

I directly controlled for these complicating factors by splitting the data by pitch type and count. What data? Statcast game logs from Baseball Savant from 2016 – July 2019, acquired through Bill Petti’s baseballr package. A big thanks to both of these awesome sources for making projects like this one accessible.

Some notes: obviously, pitchers sometimes (often) miss their location. To help address this, I defined “location” pretty generally: by splitting pitch locations into one-foot-by-one-foot buckets. And over a large sample, pitchers hit their locations on average. Also, there are other potential complicating factors. For example, having runners on base might affect pitch location: pitchers may be less willing to throw balls in the dirt with runners on. Future work in this area might consider batter handedness as well.

Weirdness on Four-Seam Fastballs: A Glimpse

After breaking the data down by count and pitch type, I looked specifically at four-seam fastballs, the most frequent pitch type in the data. (Perhaps other pitch types display different behavior, but even if that were not the case, the fact that fastball locations have a weird risk-reward relationship is notable). Pitches were grouped by truncating their horizontal and vertical locations: for example, one pitch location included all four-seamers from 1 foot above the ground to 1.99 feet above the ground, and from 1 foot right of center to 1.99 feet right of center. For each location, I obtained a distribution of outcomes by assigning a wOBA value to each pitch. For contacted pitches, I used the estimated wOBA from the exit velocity and launch angle of the batted ball. For non-contacted pitches, I used the count-specific wOBA value of a ball (if it was a ball) or the count-specific wOBA value of a strike (if it was a strike). Though I won’t go into details here, an excellent primer on wOBA from Fangraphs can be found here, this MLB.com glossary entry provides background on expected wOBA (what I used for contacted pitches), and this Hardball Times article provides an introduction to the count-specific value of a ball/strike. 

So, given a pitch type (four-seamers) and a count, I acquired the distribution of outcomes resulting from pitches to each location. Here are the important features of each location’s outcome distribution for our purposes: the mean (the reward of throwing a pitch to that location), and the standard deviation (the riskiness of throwing a pitch to that location). The theory says that if a pitch-location outcome distribution has a high mean, it should be a high-standard-deviation distribution too (high risk = high reward).

Let’s look at some results. Consider four-seamers thrown on 0-0 counts. Here is a plot of pitch locations: the color represents the expected outcome (reward) of a pitch to that location (the lighter the blue, the better the mean outcome).

Rplot

This is relatively intuitive: pitches up and middle have the worst outcomes, pitches away from the middle of the zone have better outcomes. Now, here’s the same plot of pitch locations. But this time, the coloring represents the standard deviation (risk) of throwing a pitch to each location (lighter blue = higher risk). If theory holds (higher risk = higher reward), we expect to see a similar picture: the locations of high reward should also be the locations of high risk.

Rplot01

Wait a second. This picture was supposed to be the same as the picture above, but instead it’s the inverse. The locations of high reward (light blue in the first picture) also tend to be the locations of low risk (dark blue in the second picture). The opposite is true too: in these pictures, low reward = high risk. Economic theory (anthropomorphized) is not happy.

Weirdness on Four-Seam Fastballs: More Evidence

Instead of eyeballing the intensity of various hues of blue, we can analyze the risk-reward relationship more rigorously. The following shows a linear regression, displaying the relationship between risk (on the horizontal axis) and reward (on the vertical axis) for 0-0 four seamers. The trend is the weird negative trend noted above: as risk increases, reward decreases.

Rplot02

This clearly illustrates the puzzling negative relationship between risk and reward. If there exist low-risk, high-reward pitch locations, why don’t pitchers throw to those locations all the time? In fact, it’s not just that they don’t throw to those locations all the time, it’s that they rarely do. Here is the same plot as the one above, with the size of each dot representing the number of pitches to that location. You’ll note that high-reward, low-risk pitches get thrown relatively infrequently, with most pitches being lower reward or higher risk.

Rplot03

In fact, the infrequency of high-reward, low-risk pitches may do some work in explaining their high-reward-ness. Because they’re thrown infrequently, they may catch the batter off guard. But, even though the element of surprise (and thus the high reward of such pitches) might wear off slightly if these pitches were thrown more, these pitches currently offer an exploitable advantage.

Since I’ve only shown results for 0-0 counts so far, here is a table displaying the slope of the linear model that relates risk and reward for four-seamers on each count. Also included is the p-value of the linear model. The rows are organized to show an interesting pattern: for a given number of strikes, as the number of balls increases, the relationship between risk and reward becomes even more negative.

Count Increase in Reward per Increase in Risk P-Value
0-0 -0.054 0.008
1-0 -0.052 0.146
2-0 -0.354 0.001
3-0 -0.871 0.006
0-1 0.111 0.001
1-1 0.073 0.097
2-1 -0.055 0.325
3-1 -0.246 0.050
0-2 -0.103 0.030
1-2 -0.184 0.006
2-2 -0.360 0.002
3-2 -1.083 0.000

Not all of these relationships are negative, and not all of these relationships are significant, but something strange is definitely going on here. Often, pitchers are forgoing high-reward, low-risk pitches to throw riskier pitchers with worse expected outcomes.

Maybes

Maybe I’ve defined locations to narrowly, and pitchers avoid high-reward, low-risk pitch locations due to their proximity to lower reward regions. For example, pitchers may be reluctant to aim out of the strike zone (where higher-reward, lower-risk pitch locations are often found) to avoid missing badly and throwing past their catcher.

Here’s another caveat: suppose pitchers adopt the implicit advice here, and start throwing more high-reward, low-risk pitches. This would not necessarily have the desired effect. As stated above, the effectiveness of these pitches may be (in part) thanks to their infrequency. Furthermore, the high-reward, low-risk pitches are more frequently out of the strike zone. Throwing more of these pitches would mean more balls, meaning a transition to a higher ball count is more likely. That would affect the wOBA values associated with balls and strikes, altering the outcome distribution of these pitches to make them less attractive.

All that said, these results are pretty striking. There seems to be a significant, exploitable advantage in throwing more pitches to high-reward, low-risk locations.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s