Published on

DS 4100 Day 3

Authors
  • avatar
    Name
    Jacob Aronoff
    Twitter

DS 4100 Data Collection, Integration, and Analysis

Day 3, we're going over basic control flow:

> for (i in 1:3) {
  print(paste("i =",i))
  }
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"

> i
[1] 3



cities <- c("Boston", "New York", "San Francisco")
for(city in cities) {
	print(city)
}

You can also use for loops to iterate data frames.

# create a data frame
c1<-c("AA","BB","CC")
c2<-c(11,22,33)
df<-data.frame(c1,c2)

# display data frame
df

# len() returns the number of columns
len(df)
# nrow() returns the number of rows
nrow(df)

traverseDF <- function() {
  n<-nrow(df)
  for (i in 1:n) {
    # access column 2 in row i
    print(df[i,2])
    # or this way
    print(df$c2[i])
  }
}

Now we're talking a bit more about enviroments which I went over in my most recent Learning R. Now we're on to more column manipulation:

> ap$Total <- cbind(rowSums(ap))
> ap
    V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 Total
1 1952 171 180 193 181 183 218 230 242 209 191 172 194  4316
2 1953 196 196 236 235 229 243 264 272 237 211 180 201  4653
3 1954 204 188 235 227 234 264 302 293 259 229 203 229  4821
>

Now that's pretty cool, you can add a new column which is just the sum. Just like excel, but with code <3. Here's some really nice syntax. R redeems itself in regards to it's data management. Now onto R's naming conventions and data types:

  • R usually uses lower camel case
  • Modes are weird (their sort of like thunks almost)
  • Factors are more efficient

And thats it! Next class is going to be more on factors, sadly I won't be there :( but I'll have my friend tell me what happened.