EDA

How to arrange ggplot (barplot) bars in ascending or descending order

One of the reasons you’d see a bar plot made with ggplot2 with no ascending / descending order - ordering / arranged is because, By default, ggplot arranges bars in a bar plot alphabetically. But most of the times, it would make more sense to arrange it based on the y-axis it represents (rather than alphabetically). It could be your month-wise time series or high-medium-low bars - these are some examples where an alphabetically-sorted bar chart wouldn’t make sense in fact would hinder the process of data communication.

3 tidyverse tricks for most commonly used Excel Features

In this post, We’re simply going to see 5 tricks that could help improve your tooling using {tidyverse}. Create a difference variable between the current value and the next value This is also known as lead and lag - especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done by creating a new formula field and subtracting the next cell with the current cell or the current cell with the previous cell and dragging the cell formula to the last cell.

How to Automate EDA with DataExplorer in R

EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That said, EDA is also one of the areas of the Data Science Pipeline where a lot of manual code is written for different types of plots and different types for inference.