India is the world’s largest Democracy and as it goes, also a highly diverse place. This is my attempt to see how “Hindi” and other languages are spoken in India. In this post, we’ll see how to collect data for this relevant puzzle - directly from Wikipedia and How we’re going to visualize it - highlighting the insight. Data Wikipedia is a great source for data like this - Languages spoken in India and also because Wikipedia lists these tables as html <table> it becomes quite easier for us to use rvest::html_table() to extract the table as dataframe without much hassle.
REGEX is that thing that scares everyone almost all the time. Hence, finding some alternative is always very helpful and peaceful too. Here’s a nice R package thst helps us do REGEX without knowing REGEX. REGEX This is the REGEX pattern to test the validity of a URL: ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ A typical regular expression contains — Characters ( http ) and Meta Characters (). The combination of these two form a meaningful regular expression for a particular task.
udpipe is a beautiful R package for Text Analytics and NLP and helps in Topic Extraction. While most Text Analytics resources online are only about English, This post picks up a different lanugage - Tamil and fortuntely, udpipe has got a Tamil Language Model. Loading library(udpipe) Tamil Text Below is part extracted from a Tamil Movie Review text <- data.frame(tamil = "கரு - கோமாவால் 16 வருட வாழக்கையை இழந்தவன் மனிதத்தை இந்த கால மனிதர்களுக்கு நினைவுபடுத்து தான் கோமாளி படத்தின் கரு.
This is a small code snippet to explain how to change the color scale of a ggplot. Continuous Scale Package: viridis Function: scale_fill_viridis_c() (since it’s a continuous scaled value) library(dplyr) library(ggplot2) library(viridis) mtcars %>% tibble::rownames_to_column('Car') %>% tidyr::separate('Car',c('Brand','Model'), remove = F) %>% group_by(Brand) %>% summarize(avg_mpg = mean(mpg)) %>% ggplot() + geom_bar(aes(reorder(Brand,avg_mpg),avg_mpg, fill = avg_mpg), stat = 'identity') + scale_fill_viridis_c() + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs( title = 'How to arrange Ggplot Bar plot', x = 'mpg') Discrete Scale Package: viridis Function: scale_fill_viridis_d() (since it’s a discrete scaled value)