India is the world’s largest Democracy and as it goes, also a highly diverse place. This is my attempt to see how “Hindi” and other languages are spoken in India. In this post, we’ll see how to collect data for this relevant puzzle - directly from Wikipedia and How we’re going to visualize it - highlighting the insight. Data Wikipedia is a great source for data like this - Languages spoken in India and also because Wikipedia lists these tables as html <table> it becomes quite easier for us to use rvest::html_table() to extract the table as dataframe without much hassle.
This is a small code snippet to explain how to change the color scale of a ggplot. Continuous Scale Package: viridis Function: scale_fill_viridis_c() (since it’s a continuous scaled value) library(dplyr) library(ggplot2) library(viridis) mtcars %>% tibble::rownames_to_column('Car') %>% tidyr::separate('Car',c('Brand','Model'), remove = F) %>% group_by(Brand) %>% summarize(avg_mpg = mean(mpg)) %>% ggplot() + geom_bar(aes(reorder(Brand,avg_mpg),avg_mpg, fill = avg_mpg), stat = 'identity') + scale_fill_viridis_c() + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs( title = 'How to arrange Ggplot Bar plot', x = 'mpg') Discrete Scale Package: viridis Function: scale_fill_viridis_d() (since it’s a discrete scaled value)
One of the reasons you’d see a bar plot made with ggplot2 with no ascending / descending order - ordering / arranged is because, By default, ggplot arranges bars in a bar plot alphabetically. But most of the times, it would make more sense to arrange it based on the y-axis it represents (rather than alphabetically). It could be your month-wise time series or high-medium-low bars - these are some examples where an alphabetically-sorted bar chart wouldn’t make sense in fact would hinder the process of data communication.
EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That said, EDA is also one of the areas of the Data Science Pipeline where a lot of manual code is written for different types of plots and different types for inference.
Philosophy This Post is purely aimed at helping beginners with cookbook-style code for Interactive Visualizations using highcharter package in R. About highcharter highcharter by Joshua Kunst R package is a wrapper for the ‘Highcharts’ library including shortcut functions to plot R objects.
Are you looking for some unique way of visualizing your numbers instead of simply using bar charts - which sometimes could be boring the audience - if used, slide after slide? Here’s Square Pie / Waffle Chart for you. Waffle Chart or as it goes technically, Square Pie Chart is just is just a pie chart that use squares instead of circles to represent percentages. So, it’s good to keep in mind that this is applicable better for Percentages.
This is one of the frequent questions I’ve heard from the first timer NLP / Text Analytics - programmers (or as the world likes it to be called “Data Scientists”). Prerequisite For simplicity, this post assumes that you already know how to install a package and so you’ve got tidytext installed on your R machine. install.packages("tidytext") Loading the Library Let’s start with loading the tidytext library. library(tidytext) Extracting App Reviews We’ll use the R-package itunesr for downloading iOS App Reviews on which we’ll perform Simple Text Analysis (unigrams, bigrams, n-grams).