- Published on
DS 4100 Day 2
- Authors
- Name
- Jacob Aronoff
DS 4100 Data Collection, Integration, and Analysis
Everyone is downloading R, meanwhile I'm just sitting here and finishing up Java.
Data is stored as objects in R. Objects are created by:
- Reading data from an external file
- Retrieving data from a URL
- Creating an object directly from command line
- Instantiating an object from within a program
Now we're going over R, basically what I learned in the past (6) tutorials.
> seq(from=50, to=52, along=x)
[1] 50.0 50.2 50.4 50.6 50.8 51.0 51.2 51.4 51.6 51.8 52.0
That's pretty cool, now we're going over data frames and querying data frames. R has a BUNCH of built in data sets that can be loaded easily:
> data(sunspots.year)
> sunspot_stuff = data.frame(year=1700:1988, count=sunspot.year)
> sunspot_stuff[sunspot_stuff[,1]==1950,]
> sunspot_stuff[sunspot_stuff$count>=50,]
> summary(sunspot_stuff)
year count
Min. :1700 Min. : 0.00
1st Qu.:1772 1st Qu.: 15.60
Median :1844 Median : 39.00
Mean :1844 Mean : 48.61
3rd Qu.:1916 3rd Qu.: 68.90
Max. :1988 Max. :190.20
> sunspot_stuff$count == 190.20
Time Series:
Start = 1700
End = 1988
Frequency = 1
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[253] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[289] FALSE
> sunspot_stuff[sunspot_stuff$count == 190.20,]
year count
258 1957 190.2
> sum(sunspot_stuff$count)
[1] 14049.3
>
I really enjoy the querying system in R. Say what you will, but the fact I don't have to use loops to find stuff is really really nice. The rest of class is much of the same: load data, analyze data. The homework is interesting: we have to unzip a file in R, and then make a couple functions to deal with data. Probably going to be a bunch of queries and functions to do it properly.