Published on

DS 4100 Week 2 Review

  • avatar
    Jacob Aronoff

DS 4100 Weekly Review

Continuing my journey from last week, I was excited to begin this week devling more into the R programming language. It's interesting and aggravating to learn about R. I talked with some of my friends about the strengths and (many) weeknesses of the language this week at NU Hacks. Many commented on how the type system allowed users to make many errors that could be easily prevented. We then begun talking about one of R's more annoying language features: NULL and NA. We discussed how the inclusion of the both in the language makes coding in it very difficult.

We compared this feature to Java's handling of Null, or should I say the lack thereof. A good type system (like Swift, Haskell, OCaml, etc.) removes the guess and check work from programming with Null types. In R it's particularly hindering: when working with a data set, elements can be with NA or NULL. In both of the previous cases when trying to sum a row, for example, you have to explicitly say to ignore the values. While I understand why NA is very important in the language – sometimes a data set may have missing values – the inclusion of NULL makes no sense to me. Why have both values? In a language like common lisp, there is only either T or NIL, removing the need for missing values, false, etc. Although this implementation may seem a bit awkward at first, it allows the programmer to quickly handle cases.

I sadly wasn't able to make it to class on Friday, which hopefully won't harm me too much in the future. I read the lecture notes and the lesson was on data shaping, and manipulating data to look and act the way we want. The assignment this week matches the lessons, where we have to do some interesting class manipulation to find the answer efficiently.

I'm excited for our next two classes as well. We're going to be pulling data from websites and parsing through and running some sort of data analysis. I wonder how well R will be able to handle data scraping and collection. Considering the language's purpose was for data crunching, I assume it shouldn't be too painless. That being said, a lot of stuff in R should be some way and normally is not.