In this R tutorial, We’ll learn how to schedule an R script as a CRON Job using Github Actions. Thanks to Github Actions, You don’t need a dedicated server for this kind of automation and scheduled tasks. This example can be extended for Automated Tweets or Automated Social Media Posts, Daily Data Extraction of any sort.
In this example, We’re going to use a code to extract / scrape Nifty50 (Indian Stock Exchange Index) Top Gainers Daily and store it as a csv file which can be used for Data Analytics on those stocks.
Web Scraping, by nature requires a lot of understanding from the ability to find the css selector to rightly parse the scraped content. While there are a lot of R packages (even Python packages for that matter), {ralger} does a wonderful job of abstracting the complicated things and providing a simple easy-to-use Beginner-friendly Web Scraping Package. {ralger} has simple functions to quickly scrape / extract Title Text (H1, H2, H3), Tables, URLs, Images from the given web page.
In this tutorial, we’ll see how to scrape an HTML table from Wikipedia and process the data for finding insights in it (or naively, to build a data visualization plot).
Youtube - https://youtu.be/KCUj7JQKOJA
Why? Most of the times, As a Data Scientist or Data Analyst, your data may not be readily availble hence it’s handy to know skills like Web scraping to collect your own data. While Web scraping is a vast area, this tutorial focuses on one particular aspect of it, which is “Scraping or Extracting Tables from Web Pages”.
Web Scraping in R Web scraping needs no introduction among Data enthusiasts. It’s one of the most viable and most essential ways of collecting Data when the data itself isn’t available.
Knowing web scraping comes very handy when you are in shortage of data or in need of Macroeconomics indicators or simply no data available for a particular project like a Word2vec / Language with a custom text dataset.
This post is kept (literally) minimal to demonstrate how simple is this hack using R (of course could be simple in other languages too). This is also to establish a point that R has got use-cases beyond statistics and data-mining.
Objective rstats subreddit is one of the popular sources of R-related information / discussion on the internet. We’re trying to extract the top posts of rstats subreddit.
Data Format Lucky for us, Reddit offers a json file for every subreddit (also post) and we’ll use that here.