Web Scraping, by nature requires a lot of understanding from the ability to find the css selector to rightly parse the scraped content. While there are a lot of R packages (even Python packages for that matter), {ralger}
does a wonderful job of abstracting the complicated things and providing a simple easy-to-use Beginner-friendly Web Scraping Package. {ralger}
has simple functions to quickly scrape / extract Title Text (H1, H2, H3), Tables, URLs, Images from the given web page.
Video Walkthrough

web scraping in R code
Code
Below is an example on how to scrape IMDB Website (for educational purposes) in R with {ralger}
#install.packages("ralger")
library(ralger)
link <- "https://www.imdb.com/chart/top"
node <- "#main > div > span > div > div > div.lister > table > tbody > tr:nth-child(n) > td.titleColumn > a"
extract <- scrap(link, node)
img_links <- images_preview(link)
imdb250 <- table_scrap(link)
link <- "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating"
my_nodes <- c(
".lister-item-header a", # The title
".text-muted.unbold", # The year of release
".ratings-imdb-rating strong" # The rating)
)
names <- c("title", "year", "rating") # respect the nodes order
df_rank <- tidy_scrap(link = link, nodes = my_nodes, colnames = names)
References
- ralger on Github
- Sponsor {ralger} creator with Buy me a Coffee (I’m no way affiliated to the developer, it’s just as a token of gratitude for his open source contribution to R)