Translate

Tuesday, June 25, 2013

Getting started with R


I wanted to avoid advanced topics in this post and focus on some “blocking and tackling” with R in an effort to get novices started.  This is some of the basic code I found useful when I began using R just over 6 weeks ago.
Reading in data from a .csv file is a breeze with this command.
> data = read.csv(file.choose())
No need to have your own data set as R comes with data packages already.
> data()  #list the datasets available in R
> # load the dataset 'cars' and display the variables
> data(cars)
> head(cars)
  speed   dist
1     4      2
2     4     10
3     7      4
4     7     22
5     8     16
6     9     10
#the command head() gives shows we have two variables, car speed and stopping distance along with the first 6 rows of data
#using attach() splits the data into separate columns and avoids having to use what I feel is the pesky $
> attach(cars)
# descriptive statistics of our two variables
> summary(cars)
     speed               dist      
 Min.   : 4.0          Min.   :  2.00 
 1st Qu.:12.0       1st Qu.: 26.00 
 Median :15.0     Median : 36.00 
 Mean   :15.4      Mean   : 42.98 
 3rd Qu.:19.0      3rd Qu.: 56.00 
 Max.   :25.0       Max.   :120.00 

> # univariate plots for speed

> plot(speed)



> hist(speed)




> #scatterplot for speed and dist

> plot(speed,dist)
boxplot(speed, dist, notch=T)
# you can use [] to create a subset.  Here is how to get rows 1 thru 10 of both variables
> subsetcars = cars[1:10, ]
> subsetcars
     speed   dist
1       4          2
2       4         10
3       7          4
4       7         22
5       8         16
6       9         10
7      10        18
8      10        26
9      10        34
10    11        17

#rows 1 thru 5 of just speed
> subspeed = cars[1:5, 1]
> subspeed
[1] 4 4 7 7 8

# Observations where stopping distance is greater than 50
> stop = cars[dist > 50, ]
> stop
   speed dist
22    14   60
23    14   80
26    15   54
33    18   56
34    18   76
35    18   84
38    19   68
41    20   52
42    20   56
43    20   64
44    22   66
45    23   54
46    24   70
47    24   92
48    24   93
49    24  120
50    25   85

# and finally the correlation
> cor(speed, dist)
[1] 0.8068949






No comments:

Post a Comment