DS 4100 Day 3

DS 4100 Data Collection, Integration, and Analysis

Day 3, we're going over basic control flow:

> for (i in 1:3) {
  print(paste("i =",i))
  }
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"

> i
[1] 3



cities <- c("Boston", "New York", "San Francisco")
for(city in cities) {
	print(city)
}

You can also use for loops to iterate data frames.

# create a data frame
c1<-c("AA","BB","CC")
c2<-c(11,22,33)
df<-data.frame(c1,c2)

# display data frame
df

# len() returns the number of columns
len(df)
# nrow() returns the number of rows
nrow(df)

traverseDF <- function() {
  n<-nrow(df)
  for (i in 1:n) {
    # access column 2 in row i
    print(df[i,2])
    # or this way
    print(df$c2[i])
  }
}

Now we're talking a bit more about enviroments which I went over in my most recent Learning R. Now we're on to more column manipulation:

> ap$Total <- cbind(rowSums(ap))
> ap
    V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 Total
1 1952 171 180 193 181 183 218 230 242 209 191 172 194  4316
2 1953 196 196 236 235 229 243 264 272 237 211 180 201  4653
3 1954 204 188 235 227 234 264 302 293 259 229 203 229  4821
>

Now that's pretty cool, you can add a new column which is just the sum. Just like excel, but with code <3. Here's some really nice syntax. R redeems itself in regards to it's data management. Now onto R's naming conventions and data types:

R usually uses lower camel case
Modes are weird (their sort of like thunks almost)
Factors are more efficient

And thats it! Next class is going to be more on factors, sadly I won't be there :( but I'll have my friend tell me what happened.