Since my last article, which covered the slightly superior predictive power of OPS over wOBA on runs, I began looking into reasons for this. Ostensibly, wOBA should weigh each batting outcome that results in a player reaching base safely, based on the likelihood of that outcome, a single, walk, home run etc., resulting in a run scored. In other words, if a single results in either an RBI on that hit or that batter coming around to score, every other time that a single is hit, then the weight of a single would be .5 (50% chance of a run scoring * 1 run scoring). In more sabermetric terms this is known as the change in Run Expectancy, which analyses the likelihood of a run scoring before the event occurring, then the likelihood after the event occurring. Despite this more refined method of weighing a player’s overall offensive value, if fails to do so at a greater statistical significance than OPS. For this reason I set out to build a better wOBA-like statistic.
The weights for wOBA are either taken as constants, roughly based on the average Run Expectancy change for each of these hitting events for the entirety of baseball history, or calculated yearly, by using what are known as linear weights or statistically determined weights of each event, I will use a similar process later on. wOBA also shifts the weights to the OBP scale, in essence to make wOBA more user friendly, if you know what good and bad OBP’s are then you know what good and bad wOBA’s are.
My major complaint about wOBA’s weights is that it uses a method of scaling the weights of batting outcomes by zeroing the Run Value of an out. If that statement doesn’t make much sense now, let me generate my own linear weights and I’ll explain further after that.
For this reason, I sought out to make my own method for determining the respective weights of each offensive outcome, in the most consistent way I could. To this ends I have created a statistic called Player Run Value or PRV for short. PRV assesses a player’s offensive contribution to their team, both in terms of runs scored and runs batted in, assuming everything else about a team’s offense is average. To do this I need to find the run value of each of these outcomes in a statistically relevant way. In order to find these run values, I first have to come up with a player’s total offensive contribution, this will be termed the Player Run Total where
PRT = Runs Scored + Runs Batted In – Home Runs
The idea behind PRT is that it captures each run that a player took part in, subtracting out Home Runs, which count as both a RS and an RBI, so as to avoid double counting them when comparing them against their hits. PRT allows us to analyze a player’s value of getting on base and hitting to score runners on base.
The next step is to simply run a linear regression of PRT using the complete set of positive batting outcomes, a walk, hit by pitch, single double, triple, and home run. This will give us the statistically derived impacts, aka the linear weights, that each of these events have on runs. The formula is as follows;
E(PRT)=a + b*uBB + c*HBP + d*1B + e*2B+ f*3B + g*HR
The formula itself should look familiar, it is similar to the numerator of the wOBA formula, a through g are variables representing the numbers which will be found once the linear regression is done. For this analysis I will be using the 2016 batting data for all non-pitcher major league hitters; also, I will be using the fangraphs version of wOBA which uses the outs-zeroed weights.
First things first, calculate the weights;
The numbers highlighted are the important numbers and the respective weights are as follows
Which gives us the formula for PRV;
PRV = .323*uBB+.125*HBP+.527*1B+.933*2B+1.423*3B+1.598*HR
For anyone with a deeper understanding of statistics, this is the expected value of PRT, or the trend-line for PRT, shifted to the origin.
To clarify my earlier statement, about zeroing the Run Value of an out, if we were to have instead run a regression of using Outs committed at the plate, then all the other batting outcomes we would get something that looks like this;
In order to “zero” the value of an out in this equation, I would have to add .009661 (or about .01) to each of the other terms giving me “outs-zeroed weights” of
My issue is that I disagree with counting an out as a negative to runs (in this case as about -.01 runs) being done on philosophical grounds because it assumes a run would necessarily be scored. If a run would necessarily score then an out takes that run off the board, but don’t think this is a valid assumption. We cannot assume a run is going to be scored, as a hit is not a binary outcome of run scoring or not scoring, a hit can be used to advance runners but not score them, or a base runner can be thrown out trying to advance too far, etc. Moreover, it is only when a team has made the final out that they prevent runs from scoring, a team that has made 26 outs is not prevented from scoring any more runs than a team that has made no outs. In other words if both teams scored the same number of runs, but one team scored them with 26 outs and the other team scored with 0 outs, the final score after 27 outs was the same for both teams, the number of outs made prior to, or after the scoring had no impact on the ending score. So we cannot assume the run would score without an out and then detract an out’s value, we can only measure the outcomes that actually happen, and so I’ve left it out of the calculation. Mathematically speaking, this means that the value of an out makes up part of the constant value (denoted “_const”) and is assumed to be zero when the constant value is dropped in the PRV formula, rather than calculating for it and shifting the other values accordingly.
The next step is to put PRV up against wOBA in terms of team’s real offensive production. To do this we will run a regression of these two 2016 formulas,
PRV = .323*uBB+.125*HBP+.527*1B+.933*2B+1.423*3B+1.598*HR
on 2016 team batting data. The regressions go as follows;
This tells us that PRV is just slightly better at predicting offensive output than wOBA in the 2016 season, but it doesn’t answer the more important question of which has more accurate weights. Because wOBA is in terms plate appearances and PRV is not, its impossible to compare the accuracy of the weights of each statistic just using these two regressions. This is because we don’t know what random effects plate appearances can have on the data. The fix is easy enough though, simply multiplying the wOBA of each team times their number of PA gives us the ability to directly compare the weights mathematically, because both variables are in the same terms. This new statistic, we’ll call it Weighted On Base Value (wOBV) has the following formula;
So how do the two weights stack up?
Here we see that the PRV method is slightly more accurate measure of weights. This is a reproducible outcome between 2010-2016 (as far back as I went) with PRV correlating an average of about 1% better than the wOBA weights over that time frame. While this is an admittedly small sample size, mainly because it is a lot of work to generate these values, I think it is quite likely that this trend holds through the entirety of baseball, because PRV’s weights don’t have the same flaw of zeroing an out’s value that wOBA weights do. In other words PRV weights are a slightly more accurate way of representing baseball outcomes. A final note would be that both methods are quite accurate, and there’s really no problem with using wOBA as an average fan, just like there was no problem with using OPS, its just that PRV is slightly more accurate at modeling offense in baseball, in the same way that OPS was relative to wOBA.
Just for fun let’s check the correlation between OPS and runs
You gotta be kidding…
EDIT: Part II of this article, which demonstrates how to use PRV in the game, is now up here
Let’s Go Bucs.