udpipe is a beautiful R package for Text Analytics and NLP and helps in Topic Extraction. While most Text Analytics resources online are only about English, This post picks up a different lanugage - Tamil and fortuntely, udpipe has got a Tamil Language Model. Loading library(udpipe) Tamil Text Below is part extracted from a Tamil Movie Review text <- data.frame(tamil = "கரு - கோமாவால் 16 வருட வாழக்கையை இழந்தவன் மனிதத்தை இந்த கால மனிதர்களுக்கு நினைவுபடுத்து தான் கோமாளி படத்தின் கரு.
Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called RAKE is what we’re going to see in this post. Udpipe udpipe is an NLP-focused R package created and opensourced by this organization bnosac.
Sentiment Analysis is one of those things in Machine learning which is still getting improvement with the rise of Deep Learning based NLP solutions. There are many things like Sarcasm, Negations and similar items make Sentiment Analysis a rather tough nut to crack. Deep learning as much as it’s effective, it’s also computationally expensive and if you are ready to trade off between Cost (expense) and Accuracy, then you this is the solution for building a negation-proof Sentiment Analysis solution (in R).