Extract Top Reddit Posts of #rstats in 3 lines of R Code

using jsonlite

This post is kept (literally) minimal to demonstrate how simple is this hack using R (of course could be simple in other languages too). This is also to establish a point that R has got use-cases beyond statistics and data-mining.

Objective

rstats subreddit is one of the popular sources of R-related information / discussion on the internet. We’re trying to extract the top posts of rstats subreddit.

Data Format

Lucky for us, Reddit offers a json file for every subreddit (also post) and we’ll use that here.

subreddit url: "https://www.reddit.com/r/rstats/"
subreddit json: "https://www.reddit.com/r/rstats/.json"

jsonlite @ Action

The package that will help us in this endeavor is jsonlite (by Jeroen Ooms and Team) for parsing json files and feeds. It’s got a sweet function that fromJSON() that parses a json file and stores the result in a list object. Ultimately, we can find the required information - title, url of the subreddit in there.

library(jsonlite)

reddit <- fromJSON("https://www.reddit.com/r/rstats/.json")

(top10 <- reddit$data$children$data[1:10,c("title","url")])
##                                                                                                   title
## 1                                                                         How does one fit a plm model?
## 2                                                Loading .arp files for analysis with diveRsity package
## 3                                                          Finding "Optimal" Target Inventory for Parts
## 4  Can you change the limits on a scale in ggplot based on the data based to ggplot? Explanation inside
## 5                                                                       Error: Need Finite 'ylim'values
## 6                                             Why Machine Learning Beats Econometrics in the Real World
## 7                                                                             Help with reshape() Error
## 8                                                         R &amp; stats illustrations by @allison_horst
## 9                                                                                        Time Series Qn
## 10                                                         Flexdashboard runtime shiny renderPlot issue
##                                                                                                 url
## 1                     https://www.reddit.com/r/rstats/comments/cr48no/how_does_one_fit_a_plm_model/
## 2    https://www.reddit.com/r/rstats/comments/cr1064/loading_arp_files_for_analysis_with_diversity/
## 3       https://www.reddit.com/r/rstats/comments/cqxg5q/finding_optimal_target_inventory_for_parts/
## 4   https://www.reddit.com/r/rstats/comments/cqpdq2/can_you_change_the_limits_on_a_scale_in_ggplot/
## 5                     https://www.reddit.com/r/rstats/comments/cqwac0/error_need_finite_ylimvalues/
## 6  https://medium.com/@adrianantico/machine-learning-vs-econometrics-in-the-real-world-4058095b1013
## 7                          https://www.reddit.com/r/rstats/comments/cqq6r9/help_with_reshape_error/
## 8                                               https://github.com/allisonhorst/stats-illustrations
## 9                                   https://www.reddit.com/r/rstats/comments/cqkgcc/time_series_qn/
## 10    https://www.reddit.com/r/rstats/comments/cqd1u9/flexdashboard_runtime_shiny_renderplot_issue/

3-lines

  • Load the library
  • Retrieve and Parse the json file
  • Extract the relevant information for the list object

Summary

This post while is primarily intended to demonstrate the simplicity of R and jsonlite for json parsing, it can also be used to automate such a script to email or send notification about top 10 rstats subreddit post at a scheduled interval.

Read more

Beginners Cookbook for Interactive Visualization in R with highcharter

If you like our posts, Please share it with your Friends and Network. Use our RSS Feed, to subscribe for latest update from programmingwithr.com

 
comments powered by Disqus