DS 4100 Week 9 Review

DS 4100 Weekly Review

This week I want to brainstorm more about my project. I really like the idea of doing some reddit scraping. I've been looking at a lot of cool reddit data science projects. There have been a couple stand out projects in the past year that have really interested me; there was one about the occurance of swear words in different popular subreddits; another was about linking to news websites in different political subreddits; another was about subreddits that are the most supportive/ kind. That kind of sentiment analysis over such a large data set is really cool.

While it's cool that these people generated a top and bottom ten list for their criteria, I want something that's a bit more dynamic, so that the layman could use it. Here's how the website would work: a user types in a list of words (one word is also valid), a subreddit, and a time frame. The website then makes a call to the backend with the data. The backend then looks if the subreddit's data has already been added to the database, if it has been added the process is kicked off, if it hasn't, the api is hit and the data is added to the database. I already have the skeleton for adding data to the database, so that part shouldn't be too hard.

The process to be kicked off is the following: get all the times in the database the word has been used. From that point on we can return a bunch of different statistics about the word's usage: the amount of times it's been used, the average amount of points for the word, the max and the min, etc. I can also display a graph of the amount of times it's been used per post per day in the timeframe.

I would also love to see how different people use the service, and collect some analytics on how it's being used. I would love to use the analytics for more information about how to make the service better.