Do you love Data Science? I mean, the Data part in it

Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that's the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle where Data given to you is what most of the Data Scientists dream about in their jobs.

Data Foundation

"AI God fathers" have a good fan following but many of us know Fei-Fei Li whose (with her team)contribution of building the ImageNetfor AI is invaluable.

"One thing ImageNet changed in the field of AI is suddenly people realized the thankless work of making a dataset was at the core of AI research. People really recognize the importance the dataset is front and center in the research as much as algorithms." - Fei-Fei Li

Data Startup

Meanwhile, Venture Capitalists aren't shying away from putting their money where Data is created and curated - Recently, silicon-valley startup Scale AI has hit the unicorn status. Scale AI's about us page reads:

The Data Platform for AI

Scale AI has also open-sourced Datasets and That's sweet.

Build your own Data

Zalando that open-sourced Fashion-MNIST published a nice paper that listed out the steps they took to publish the dataset. There are also free tools like labelImg and to help you annotate images for a typical Image dataset. For NLP Annotation, BRAT is a nice free open-source tool. And, If you are planning for a pet project and don't have the required dataset this tutorial by Mat Kelcey of counting bees on a rasp pi with a conv net would be a tremendous help.

In R, Check out this to learn How to generate meaningful fake data for learning, experimentation and teaching using {fakir}.

That said, If you appreciate Data Science as much as you'd appreciate the beauty of a Ferrari or Lamborghini, then you might also have to remind you that car is only useful if you've got the oil in it which is your super-clean labelled Data that's usable for Data science and Machine Learning.

If you liked this, Please subscribe to my Data Science Newsletter and also share it with your friends!

comments powered by Disqus