Skip to content

July 11, 2015

Download eight years’ worth of Reddit comments

by John_A

Archive

If you need (almost) every publicly available Reddit comment for any reason — hey, maybe you’re a researcher or maybe you just love data — then ready your external HDD, because someone bundled ’em all up nicely. User “Stuck_In_the_Matrix” collected every comment he could from as far back as October 2007, two years since the website was founded, up until May 2015. It took him 14 months and about 20 million API calls to farm around 1.65 billion entries, though approximately 350,000 couldn’t be collected due to issues with Reddit’s API.

Those comments are saved as plain text, along with their authors’ usernames, scores and subreddit locations, among other info. Archive.org even considered the feat notable enough to preserve for future generations. You can get the compilation right now through the torrent file “Stuck_In_the_Matrix” provided, but take note that all that data totals 150GB when compressed and almost a terabyte uncompressed. In case you’re unwilling to invest time in downloading something you haven’t seen before, his original Reddit post also comes with a much smaller one-month sampler.

[Image credit: Getty Images]

Filed under: Misc

Comments

Source: Reddit, Archive

Read more from News

Leave a comment

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments