Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called
RAKE is what we’re going to see in this post.
udpipe is an NLP-focused R package created and opensourced by this organization bnosac. Thanks to them,
udpipe is the R package that many a times solves the pain of not having native
spacy for R.
Udpipe - Installation
Udpipe - Loading
Udpipe - Language Model
An NLP library is as good as its Language Model because the Language model contains the recipe of how to annotate your text corpus. So, before we proceed further, we need to download the language model for us to use. In this case, We’ll download English Language model as we’re going to do Topic Extraction for English Reviews (Text).
en <- udpipe::udpipe_download_model("english")
Language model, once downloaded can be used later on without requiring to be redownloaded for every session.
Customer Reviews - Extraction
itunesr package to extract reviews of Amazon US App from Apple App Store.
library(itunesr) reviews1 <- getReviews("297606951", "us", 1) reviews2 <- getReviews("297606951", "us", 2) reviews <- rbind(reviews1, reviews2) head(reviews)
## Title ## 1 Unable to “infinite” scroll down ## 2 Digital gift cards ## 3 Love it ## 4 Titles/pics on Order Confirmation page maybe? ## 5 what in the firetruck were y’all thinking? ## 6 Horrible “update” ## Author_URL Author_Name ## 1 https://itunes.apple.com/us/reviews/id114026252 Roooofjdhdjdj ## 2 https://itunes.apple.com/us/reviews/id1002524958 Kaden J M ## 3 https://itunes.apple.com/us/reviews/id967452355 prettyluhlaaii ## 4 https://itunes.apple.com/us/reviews/id385870501 Sharonhia ## 5 https://itunes.apple.com/us/reviews/id609712012 RowdyGringo ## 6 https://itunes.apple.com/us/reviews/id152052211 acepilot79 ## App_Version Rating ## 1 13.17.0 1 ## 2 13.17.0 3 ## 3 13.17.0 5 ## 4 13.17.0 3 ## 5 13.17.0 1 ## 6 13.17.0 2 ## Review ## 1 Now the app makes you tap next page when viewing items. To me it feels longer to do and very inconvenient. Please bring back the scroll and please add NIGHT MODE! ## 2 Overall I love Amazon, it is very convenient whenever I need something and I can buy it on Amazon. But there is one thing, digital codes, I can never buy them. It always says something like edit quantities and it is really annoying plz fix that. ## 3 I love this app . It has everything and it’s so easy and simple to use . ## 4 Is there a specific reason why you guys don’t showing the successfully purchased items’ names/pics/price on the order confirmation page? I wanna show the confirmation screenshot to my friends but no useful information contained on the screenshot. I have to tap away to the order history page to get the info. Considering to add them into confirmation page plz. Thanks. ## 5 why in the firetruck do i have to click the next button when scrolling through my search results? what happened to loading more results once i reached the end of the ones currently displayed? jesus christ.. considering cancelling my prime now\n#BringBackEndlessScrolling ## 6 I can’t stand this latest update. It’s very buggy. And instead of scrolling through the search results you now have to select the next page button to see more. Why? What’s the point in “fixing” that worked perfectly fine? This is a huge step backwards. ## Date ## 1 2019-09-13 07:51:00 ## 2 2019-09-13 07:26:37 ## 3 2019-09-13 06:38:41 ## 4 2019-09-13 06:29:20 ## 5 2019-09-13 06:22:42 ## 6 2019-09-13 06:13:36
At this point, We’ve about 98 Reviews (Text) of Amazon iOS App from US Apple Store.
Customer Reviews - Only Negative (1 & 2-star)
We’ll pick only the negative reviews (1 & 2-star) to understand what pain points are customers talking about while rating Amazon bad.
reviews_neg <- reviews[reviews$Rating %in% c('1','2'),] nrow(reviews_neg)
##  76
Customer Reviews - Annotation
We’re going to do Topic Extraction from the above extracted 70 Reviews. But before we can proceed with Topic Analysis, We need to annotate the text with the language model that we downloaded above.
model <- udpipe_load_model("english-ewt-ud-2.3-181115.udpipe") doc <- udpipe::udpipe_annotate(model, reviews_neg$Review)
Let’s look at the object
doc to see what’s there in it.
##  "doc_id" "paragraph_id" "sentence_id" "sentence" ##  "token_id" "token" "lemma" "upos" ##  "xpos" "feats" "head_token_id" "dep_rel" ##  "deps" "misc"
Considering the scope of this post is Topic Analysis, I’ll leave out the basics of NLP (to understand the above terms, if you’re not familiar) for another post.
Topic Extraction using RAKE
RAKE stands for Rapid Automatic Keyword Extraction. Please check out the documentation for more understanding of the algorithm behind the function
keyword_rake() which we’ll use to perform Topic Extraction.
doc_df <- as.data.frame(doc) topics <- keywords_rake(x = doc_df, term = "lemma", group = "doc_id", relevant = doc_df$upos %in% c("NOUN", "ADJ")) head(topics)
## keyword ngram freq rake ## 1 last update 2 3 2.080000 ## 2 continuous scrolling 2 4 2.000000 ## 3 dark mode 2 2 2.000000 ## 4 search result 2 4 2.000000 ## 5 latest update 2 4 1.846667 ## 6 wish list 2 2 1.700000
Voila! Topics (or as technically it goes, Keywords) have been extracted using RAKE. As the output above states, we also get to see few metrics like
rake score against those Topics.
Let’s load up the library
tidyverse to kickstart our Analysis
and make a bar chart of the top 10 topics based on the rake score.
topics %>% head() %>% ggplot() + geom_bar(aes(x = keyword, y = rake), stat = "identity", fill = "#ff2211") + theme_minimal() + labs(title = "Top Topics of Negative Customer Reviews", subtitle = "Amazon US iOS App", caption = "Apple App Store")
That’s a nice plot indicating the top customer pain points. Seems the latest update and its error messages didn’t go well with the Customers. This is a simple bar plot but the output of
RAKE could also be used to make a correlation plot between
rake score and
freq to add extra dimension in understanding More frequently occuring topics.
udpipe is a very handy package if you are in the business of NLP and Text Analytics. It also supports multiple other Languages like German, French other than English.