Published on

DS 4100 Week 3 Review

  • avatar
    Jacob Aronoff

DS 4100 Weekly Review

Very exicted for next week! This past week was pretty fun, I enjoyed getting the chance to use R's interesting type system. Our assignment this week was to basically convert string -> datetime -> int -> datetime. I did this in a pretty interesting way. First, I made a new column in the data frame which represented the "Posix" version of the string in the "date" column. I then checked that the class of the data in the column was what I expected. The class was what I anticipated "Posixt" and "Posixct". I then set the class of the new column equal to the class of an integer. By doing so, I was now able to compare the datetimes like normal integers. This was a really interesting and weird approach that could only be done in a language like R.

I'm excited for our next assignments which are all about parsing and analyzing data from multiple sources. I have three assignments due next sunday. The first of which is writing some basic functions to analyze a given data set. In this assignment we're also getting into execution time, we have to measure the amount of time our functions take to run.

Our second assignment is reading from Excel files, which I'm also excited to learn about. This assignment is going to be about cleaning up data, which is apparently a HUGE part of data science. I'm excited to apply some of my knowledge about database design to finish the assignment. I'm also excited to learn more about processing excel files because now I can go back and look at some files from high school and try and analyze them.

Finally, our third assignment is handling XML data. I'm excited to deal with XML, because once I've done that, I'll be able to parse HTML as well. All of these assignments seem both interesting and challenging. I'm enjoying this work a lot because I get to deal with a lot of different sources and I'm also constantly being challenged by the questions. I'm particularly excited to get started on the efficiency aspect of this, it's some interesting meta programming; checking the efficiency of R functions I wrote by writing functions in R.