Published on

DS 4100 Day 2

Authors
  • avatar
    Name
    Jacob Aronoff
    Twitter

DS 4100 Data Collection, Integration, and Analysis

Everyone is downloading R, meanwhile I'm just sitting here and finishing up Java.

Data is stored as objects in R. Objects are created by:

  • Reading data from an external file
  • Retrieving data from a URL
  • Creating an object directly from command line
  • Instantiating an object from within a program

Now we're going over R, basically what I learned in the past (6) tutorials.

> seq(from=50, to=52, along=x)
[1] 50.0 50.2 50.4 50.6 50.8 51.0 51.2 51.4 51.6 51.8 52.0

That's pretty cool, now we're going over data frames and querying data frames. R has a BUNCH of built in data sets that can be loaded easily:

> data(sunspots.year)
> sunspot_stuff = data.frame(year=1700:1988, count=sunspot.year)
> sunspot_stuff[sunspot_stuff[,1]==1950,]
> sunspot_stuff[sunspot_stuff$count>=50,]
> summary(sunspot_stuff)
      year          count
 Min.   :1700   Min.   :  0.00
 1st Qu.:1772   1st Qu.: 15.60
 Median :1844   Median : 39.00
 Mean   :1844   Mean   : 48.61
 3rd Qu.:1916   3rd Qu.: 68.90
 Max.   :1988   Max.   :190.20
> sunspot_stuff$count == 190.20
Time Series:
Start = 1700
End = 1988
Frequency = 1
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[253] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE
> sunspot_stuff[sunspot_stuff$count == 190.20,]
    year count
258 1957 190.2
> sum(sunspot_stuff$count)
[1] 14049.3
>

I really enjoy the querying system in R. Say what you will, but the fact I don't have to use loops to find stuff is really really nice. The rest of class is much of the same: load data, analyze data. The homework is interesting: we have to unzip a file in R, and then make a couple functions to deal with data. Probably going to be a bunch of queries and functions to do it properly.