Daily Stock Gainers Automated Web Scraping in R with Github Actions

using rvest, Github Actions

In this R tutorial, We’ll learn how to schedule an R script as a CRON Job using Github Actions. Thanks to Github Actions, You don’t need a dedicated server for this kind of automation and scheduled tasks. This example can be extended for Automated Tweets or Automated Social Media Posts, Daily Data Extraction of any sort.

In this example, We’re going to use a code to extract / scrape Nifty50 (Indian Stock Exchange Index) Top Gainers Daily and store it as a csv file which can be used for Data Analytics on those stocks.

Video Tutorial on Scheduling R Script using Github Actions

Please Subscribe to the channel for more Data Science (with R - also Python) videos

Github Actions with R

Github Actions which usually trigger a script based on event like PR, Issue Creation can be modified using its YAML to trigger a script on a schedule (CRON).

Here’s the main.yml file used for the Github Action.

name: nifty50scrape

# Controls when the action will run.
on:
  schedule:
    - cron:  '0 13 * * *'


jobs: 
  autoscrape:
    # The type of runner that the job will run on
    runs-on: macos-latest

    # Load repo and install R
    steps:
    - uses: actions/checkout@master
    - uses: r-lib/actions/setup-r@master

    # Set-up R
    - name: Install packages
      run: |
        R -e 'install.packages("tidyverse")'
        R -e 'install.packages("janitor")'
        R -e 'install.packages("rvest")'
    # Run R script
    - name: Scrape
      run: Rscript nifty50_scraping.R
      
 # Add new files in data folder, commit along with other modified files, push
    - name: Commit files
      run: |
        git config --local user.name actions-user
        git config --local user.email "actions@github.com"
        git add data/*
        git commit -am "GH ACTION Headlines $(date)"
        git push origin main
      env:
        REPO_KEY: ${{secrets.GITHUB_TOKEN}}
        username: github-actions

Look at this repo for more details of the code used for Scraping - https://github.com/amrrs/scrape-automation

For more details on Github Actions for R Scripts, Refer this R OpenSci Book - https://ropenscilabs.github.io/actions_sandbox/

 
comments powered by Disqus