How to scrape Zomato Restaurants Data in R

using rvest

Zomato is a popular restaurants listing website in India (Similar to Yelp) and People are always interested in seeing how to download or scrape Zomato Restaurants data for Data Science and Visualizations.

In this post, We’ll learn how to scrape / download Zomato Restaurants (Buffets) data using R. Also, hope this post would serve as a basic web scraping framework / guide for any such task of building a new dataset from internet using web scraping.

Steps

  • Loading required packages
  • Getting web page content
  • Extract relevant attributes / data from the content
  • Building the final dataframe (to be written as csv) or for further analysis

Note: This post also assumes you’re familiar with Browser Devtools and CSS Selectors

Packages

We’ll use the R-packages rvest for web scraping and tidyverse for Data Analysis and Visualization

Loading the libraries

library(rvest)
library(tidyverse)
zomato web scraping

zomato web scraping

Getting Web Content from Zomato

zom <- read_html("https://www.zomato.com/bangalore/restaurants?buffet=1")

Extracting relevant attributes

Considering, It’s Restaurant listing - the columns that we can try to build are - Name of the Restaurant, Place / City where it’s, Average Price (or as Zomato says, Price for two)

Name of the Restaurant

This is how the html code for the name is placed:

<a class="result-title hover_feedback zred bold ln24   fontsize0 " href="https://www.zomato.com/bangalore/barbeque-nation-indiranagar" title="barbeque nation Restaurant, Indiranagar" data-result-type="ResCard_Name">Barbeque Nation</a>

So, what we need is for a tag with class value result-title, the value of attribute title.

zom %>% html_nodes("a.result-title") %>% 
  html_attr("title") %>% 
  stringr::str_split(pattern = ',') -> listing

As a good thing for us, Zomato’s website is designed in such a way that the name and place of the Restaurant are within the same css selector a.result-title - so it’s one scraping. And it’s separated by a , so we can use str_split() to split and the final output is now saved into listing which is a list.

Converting List to Dataframe

zom_df <- do.call(rbind.data.frame, listing)
names(zom_df) <- c("Name","Place")

In the above two lines, we’re trying to convert the listing list to a dataframe zom_df and then rename the columns into Name and Place

Extracting Price and Adding a New Price Column

zom_df$Price <- zom %>% html_nodes("div.res-cost > span.pl0") %>% 
  html_text() %>% 
  parse_number()

Since the Price field is actually a combination of Indian Currency and Comma-separated Number (which is ultimately a character), we’ll use parse_number() function remove the Indian currency unicode from the text and extract only the price value number.

Dataset

head(zom_df)
##                                Name             Place Price
## 1 abs absolute barbecues Restaurant      Marathahalli  1600
## 2            big pitcher Restaurant  Old Airport Road  1800
## 3                 pallet Restaurant        Whitefield  1600
## 4        barbeque nation Restaurant       Indiranagar  1600
## 5            black pearl Restaurant      Marathahalli  1500
## 6      empire restaurant Restaurant       Indiranagar   500

Price Graph

zom_df %>% 
  ggplot() + geom_line(aes(Name,Price,group = 1)) +
  theme_minimal() +
  coord_flip() +
  labs(title = "Top Zomato Buffet Restaurants",
       caption = "Data: Zomato.com")

Summary

Thus, We’ve learnt how to build a new dataset by scraping web content and in this case, from Zomato to build a Price Graph.

Share this Story

If you liked this, Share this Article with your and Also, Please subscribe to my Language-agnostic Data Science Newsletter and also share it with your friends!

 
comments powered by Disqus