Fantasy Baseball Calculator

Subtitle

Merging Categories

When looking at multiple statistical categories, it can be difficult to eyeball what the better value is. consider the following examples:


Player A: 35HR, 100RBI, 100R, 10SB, .285 AVG

Player B: 20HR, 89RBI, 105R, 5SB, .290AVG


Most people would conclude that Player A in this case is the better choice between these two. Most of the categories are close, and Player A has the better stats for the majority. But what about when we introduce:


Player C: 12HR, 75RBI, 130R, 65SB, .310AVG


Now this is a tougher call. To be honest, I do not know which of these imaginary players is better, I can't say for certain if 55 more stolen bases is worth losing 23 homeruns; or if gaining 25 batting points is worth losing 25 RBI. Does the 30 Runs make up the difference to make Player C the most valuable? I don't have these answers off the top of my head, I could take an educated guess (and I would guess C), but I can know for sure by figuring it out with statistics.


To do this, we need to figure out exactly how much one of 1 stat is with more than another. A good way to do this is by calculating its z-score.


Taken from Wikipedia: 


"In statistics, a standard score indicates by how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing"


Well, doesn't that clear things up. No? ok maybe this graph will help:

Still confused? Me too. In Layman's terms, A z-score is a method used in data analysis which measures how far a given data point is from the average of all the data points. Ok I feel like I'm not getting anywhere with this - let me reference the above example.



Player A: 35HR, 100RBI, 100R, 10SB, .285 AVG

Player B: 20HR, 89RBI, 105R, 5SB, .290AVG

Player C: 12HR, 75RBI, 130R, 65SB, .310AVG 

Let's look at just the Home Runs for these 3 imaginary Players: We can see that clearly, Player A has the most, and is therefore the most valuable. Asking ourselves who is the most valuable is not quite the question we should be asking. Instead, we should be asking how much MORE valuable Player A's homeruns are compared to Player B and Player C (and ultimately to the rest of the player pool).

Here is the calculation for a z-score (or standard score as it is commonly referred)

you will become very familiar with this 

X is the Player Stat of interest. X will change for each player that you calculate this for - and yes, you will calculate this for EACH AND EVERY player for a proper evaluation. 


The Mu (the one that looks like a lower case U), represents the Average of all the players we will be evaluating, and the Sigma (lower case o) is the standard deviation for the data set. Standard Deviation is just a measurement of how far a data set varies from the average of the data set. Excel has built in functions for both of these : AVERAGE() AND STDEV(), so calculating this is a breeze. 


Let's use 2012 stats as an example. I gathered together all the hitting stats for all batters in both leagues and the average player for ALL batters (i excluded pitchers) that had at least 1 plate appearance was 7.769 Home Runs (4910 total). The Standard Deviation for this data set is 9.196 according to Excel. 


From here, it is very easy to calculate - especially in excel.

Player
 HR Total
Z-Score: Mean = 7.769 ; SD = 9.196
Player A
 35
2.961
Player B
 20
1.330
Player C
 12
0.460  
A z-score that is greater than 4 is usually reserved for the elite outliers in a statistical category. Similarily, a z-score less than -4 is usually reserved for the lowest of the lows in a statistical category. Calculating z-scores for Saves is a good example of this. When you add up ALL of the players' z-scores that make up the data sets, they should all sum to 0. This means that the average player will have a z-score of 0.

In this example, we see that Player A  has a z-score of 2.961, which is 1.631 greater than Player B and 2.501 more than Player C.

I should make a note of the wikipedia entry on z-scores. It said that it is a DIMENSIONLESS quantity. This means that if we were to compare the z-scores of stats against other z-scores of stats, we will be able to quantify exactly how much more valuable their deviation from the league average is. The z-scores put all the stats on the same scale and thus, allow for easy data analysis. Here are the z-scores for the categories of R, RBI, and SB
Player
Runs Scored 
Z-score: Mean = 32.82 ; SD = 29.39
Player A
100
2.286
Player B
105
2.456
Player C
130
3.307
Player
 Runs Batted In
Z-score: Mean = 31.25 ; SD = 29.64
Player A
100
2.320
Player B
89
1.948
Player C
75
1.476
Player
 Stolen Bases
Z-score: Mean =5.104  ; SD = 8.332
Player A
10
0.588
Player B
5
-0.012
Player C
65
7.189

Notice the -0.012 that Player B has for Stolen Bases? His total of 5 is below the 5.104 league average, so it makes sense that when compared to the average player that this z-score should be lower. When we sum these z-scores, we can get a good comparison of how well each player will contribute to all the stats. 


These Totals are:

 Player
 z-score HR
 z-score R
 z-score RBI
 z-score SB
 z-score sum
Player A
2.961
2.286
2.320
0.588
8.155
Player B
1.330
2.456
1.948
-0.012
5.722
Player C
0.460
3.307
1.476
7.189
12.432

Surprising isn't it? Now I was expecting Player C to be the more valuable, mostly because I have calculated so many z-scores I can estimate the comparisons in my head. However I would not have expected Player C to be over 50% MORE valuable than Player A, and that is even before we consider the Batting Average stat.


Why didn't I include the batting average stat? Allow me to demonstrate:

Player
 AVG
z-score: Mean =0.234 ; SD =0.069 
Player A
.285
0.739
Player B
.290
0.812
Player C
.310
1.101

The Problem with doing a z-score in this traditional sense on a rate stat like Batting Average is it does not account for frequency. let's add Player D to the mix. He played 1 game, and went 4 for 5. This is a good game but obviously does not deserve to be considered drafting right?

Player
AVG 
z-score: Mean =0.234 ; SD =0.069
Player A
.285
0.739
Player B
.290
0.812
Player C
.310
1.101
Player D
.800
8.203

Uh-oh. There is no way that a Player that has 4 hits total should be valued 8x more than a player that hit over .300 for an entire season. To fix this problem, we need to change the way we look at this stat.


Rather than thinking about just the batting averages, we need to be thinking about how many more or less hits the player would have in comparison to the average player. To do this, we need to know the sum of all the AB for the league, as well as the number of Hits for the league. This stat I have labeled xBA:

Lets expand on the stats for our 4 imaginary Players. The 2012 total hits = 41408 ; AB = 160187

Player
 Hits
 At Bats
 Average
 xBA
Player A
157
551
.285
14.8665
Player B
158
545
.290
17.1187
Player C
171
551
.310
28.5665
Player D
4  
5
.800
2.7075

This is beginning to make more sense. This data can be read that Player A has hit 14.8665 MORE hits than what the average player would hit given the same number of At Bats. For Player D, the average player would have 2.7075 less hits than what he got in his lone game, but because it was just over 1 game, his score is noteably lower than players that have hit over the course of the entire season. This becomes more useful when comparing stats for the platoon hitters of the leagues who will hit a high average, but over a hundred at bats lower than some of the league's every day players. 


Is it more valuable to have 103 hits over 356 at bats (.289) or 168 hits over 602 at bats (.267)?

using this method we can calculate this answer exactly. In fact, here it is:


103 hits over 356 at bats = 10.974

168 hits over 602 at bats = 12.383


In this case, and with using the same league totals as the table above, we conclude that the .267 hitter will be more beneficial to your team than the part time hitter. Now if you were to use two platoon hitters and get the same at bats and compare it to the .267 hitter, you would probably make a strong argument - but that is more strategy based and out of the focus realm of this project. This method is strictly an apples to apples comparison and will let you know how the players themselves stack up with their raw stats.


Take note that when you perform the xBA calculation for the entire league, the sum of those calculations should be exactly 0. This makes sense as xBA is using the league average as the benchmark, so when you use the entire league, the average is met. Then you take the z-score of the xBA stat to get your batting average in a comparable state with the others. To finish the example:

Player
 xBA
 z-score: Mean = 0 ; SD = 10.634
Player A
14.8665
1.398
Player B
17.1187
1.610
Player C
28.5665
2.686

And so for our imaginary hitters are valued at in the standard 5x5 format for 2012:

Player
z-score HR
z-score R 
z-score RBI
z-score SB
z-score xAVG
 Total
Player A
2.961
2.286
2.320
0.588
1.398
9.553
Player B
1.330
2.456
1.948
-0.012
1.610
7.332
Player C
0.460
3.307
1.476
7.189
2.686
15.118


This method of turning a rate stat into a counting stat can be used on all the categories. The basic formula is:

So when looking at at a stat like slugging, which is SLG = TB/AB it would be:

Pretty easy huh? One thing to remember - when you are calculating for stats where lower scores are better, such as WHIP and ERA, you need to invert the whole score by multiplying by -1 : 

Other similar inverted stats for pitchers are: HRA, HA, L, HR/9, BAA

Other similar inverted stats for batters are: K, CSB, and K%

With all the appropriate z-scores calculated and added for all the categories selected, the Fantasy Baseball Calculator will give you a custom ranking.


Now with all of that said, Click Here to let me explain to you why using this ranking is completely wrong