Translate

Wednesday, September 25, 2013

NFL week 3 update

With another NFL week down we are starting to see separation from the contenders and the “better luck next year” teams.  Any paid TV mouthpiece worth their salt will tell you it is a quarterback driven league.  Driven indeed.  In the last post, I dabbled in the simple code of a correlation heat map.  Now, I realize I may have led the flock astray in my haste to create some wow graphics using the data from advancednflstats.com.  But that is the great thing about R, the ability to cook up some code and add salt to taste.  So, let’s kick it up a notch with a look at the week 3 NFL QB data, creating different versions of correlation plots.  Choose your preference, a la carte!

> library(corrplot) #load the versatile package corrplot
> attach(nfl)
> head(nfl)
Note: top 5 quarterbacks thru week 3:
      1.      Peyton Manning
      2.      Jay Cutler
      3.      Ryan Tannehill
      4.      Drew Brees
      5.      Philip Rivers

> qb = cor(nfl[ ,4:15]) # correlation subset of continuous variables

The package corrplot has 7 different visualization methods: "circle", "square", "ellipse", "number", "shade", "color", "pie"


> corrplot(qb, method = "circle")






















> corrplot(qb, method = "ellipse")


























Two nice examples, but this is my favorite below.  I really like the ability to see both the visual portrayal and the statistics on one chart.

> corrplot.mixed(qb)




















If you are interested in further pursuing corrplot, I recommend this website.


Friday, September 20, 2013

Are you ready for some Football? (No not soccer)

With two weeks of NFL football under our belts, it is time to start peaking under the proverbial hood at some of the statistics.  What better way than with R?  If you want the best stats out there, I recommend the website http://www.advancednflstats.com/  .   In order to understand the variables you will need to spend some time looking at the glossary.  An excellent in-depth companion book to these advanced statistics is Mathletics, authored by Wayne Winston of Indiana University.  Wayne also publishes a blog http://waynewinston.com/wordpress/ .  As such, I'm not going to get into the nitty gritty of these variables.

I've downloaded the Quarterback stats through week 2 and will do some simple data visualization, a scatterplot matrix and a correlation heatmap.  This is some simple code to get you on your way to multivariate visualization.

> str(qb)  #structure of the data named "qb"
'data.frame':   33 obs. of  18 variables:
 $ Rank   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Player : Factor w/ 33 levels "1-C.Newton","10-E.Manning",..: 13 22 8 11 31 28 12 18 7 5 ...
 $ Team   : Factor w/ 32 levels "ARZ","ATL","BLT",..: 10 6 12 26 20 24 17 4 14 16 ...
 $ G      : int  2 2 2 2 2 2 2 2 2 2 ...
 $ WPA    : num  1.03 1.02 0.91 0.85 0.82 0.76 0.64 0.53 0.53 0.51 ...
 $ EPA    : num  41.2 6.3 42.1 41.5 25.7 10.4 14.6 2.1 19.5 6.4 ...
 $ WPA.G  : num  0.52 0.51 0.46 0.43 0.41 0.38 0.32 0.27 0.27 0.26 ...
 $ EPA.P  : num  0.44 0.08 0.48 0.48 0.28 0.12 0.17 0.03 0.23 0.07 ...
 $ SR...  : num  55.9 55.3 61.4 55.2 50.5 50 51.7 45.5 52.4 42.7 ...
 $ Att    : int  85 72 79 76 81 62 72 66 66 70 ...
 $ Cmp    : int  57 49 55 50 52 39 47 45 43 42 ...
 $ Cmp.   : num  67.1 68.1 69.6 65.8 64.2 62.9 65.3 68.2 65.2 60 ...
 $ PassYds: int  769 532 813 614 679 631 591 446 499 396 ...
 $ Sk     : int  3 1 6 3 6 4 9 1 7 5 ...
 $ SkYds  : int  17 8 50 18 42 29 39 9 37 26 ...
 $ Int    : int  0 3 1 1 3 0 1 1 1 0 ...
 $ X.Deep : num  17.6 15.3 17.7 22.4 24.7 30.6 12.5 21.2 25.8 8.6 ...
 $ AYPA   : num  8.5 5.3 8.4 7 5.8 9.1 6.3 5.9 5.7 4.9 ...

Of the 18 variables, 16 are continuous, but we not concerned with "Rank" (at least not in week2) and"G", which is number of games played.

> pairs(qb[ ,5:18])  #base package scatterplot matrix























Yawn! 

We could improve this with more code, but it still just won't "pop" visually.  An option would be to use the lattice package, which I describe in a previous post.  However, I'm intrigued by heatmaps, in particular as a way to portray correlations.

For this, you will need to load the ggplot2 and reshape2 packages.

> library(ggplot2)
> library(reshape2)
> # simple code to create a correlation data set and put it into a heatmap
> corqb = cor(qb[ ,5:18])
> qplot(x=Var1, y=Var2, data=melt(cor(corqb)), fill=value, geom="tile")  #Note: depending on your system, you may need to use X1 and X2 in place of Var1 and Var2























Let's take a look at a very simple correlation on this chart.  Find the variables "Sk" and "SkYds" and look at their high level of correlation.  This should be no surprise as Sk is for the number of times sacked and yes, you guessed it, SkYds is the total yards lost as a result of those sacks.

Let's look at QB rank, sacks, yards lost by sacks and interceptions
> corqb2 = qb[c(1,14,15,16)]
> qplot(x=Var1, y=Var2, data=melt(cor(corqb2)), fill=value, geom="tile")


























And, here are the correlation numbers...

> cor(corqb2)
              Rank               Sk                     SkYds                 Int
Rank     1.0000000      0.27489633      0.3330972          0.32814607
Sk         0.2748963     1.00000000       0.9117308         0.06699875
SkYds  0.3330972      0.91173078      1.0000000          0.13743870
Int        0.3281461      0.06699875      0.1374387          1.00000000


At this point in the season, the QB rank is not highly correlated with these bad things happening.  It will be interesting to see this change as the season progresses.