DS 4100 Weekly Review

This week I want to talk about one of the most interesting stories i’ve seen in the past couple of months. The story was on FiveThirtyEight, it was about analyzing one of the most controversial subreddits on reddit: The_Donald. The author ran a bunch of queries about reddit to insert data into a huge database. He then used this information to do what the author coined as “Reddit Algebra”. This means that he was able to run queries, to determine the similarity between subreddits. Using information like who commented where, and who subscribed where, he could determine the results. Using this information he constructed a list of subreddits similar to The_Donald. He found that

1.) r/Conservative 0.741 Discussion of conservative philosophy 2.) r/AskTrumpSupporters 0.737 Q&A with Trump supporters 3.) r/HillaryForPrison 0.675 Extreme anti-Clinton commentary 4.) r/uncensorednews 0.661 News with a focus on far-right-wing views 5.) r/AskThe_Donald 0.634 Q&A subreddit run by r/The_Donald moderators

These results are pretty expected and pretty interesting how accurate they are given it was all done with math and SQL. The next part is very interesting, he used ‘subtraction’ to figure out which subreddits, basically he wanted to see how one subreddits users, minus anothers subreddits users, equals a third subreddit. Here were his results:

1.) r/fatpeoplehate 0.275 For sharing insults aimed at overweight people (now banned) 2.) r/TheRedPill 0.274 Virulently misogynistic subreddit, nominally devoted to “sexual strategy” 3.) r/Mr_Trump 0.266 Now-dormant subreddit formed during a moderator schism at r/The_Donald 4.) r/**town 0.266 Open and enthusiastic racism against black people (now banned) 5.) r/4chan 0.253 Screenshots of 4chan.org posts

This kind of analysis is really cool, I’d be interested in delving even deeper into what this guy did. Maybe for my project, I could use some of the work this guy did, and build off of it even more, he opensourced his data and queries. Maybe I could go in deeper and find how similar users are, not just subreddits. Or maybe I use some of this work and make a website that allows anyone to do this analysis, I could also do the analysis in a different language, and have my data go through a different database.