What does a Data Scientist do? How can you get into the Data Science field as a fresher and with experience? What are the qualifications and requirements to start a career in Data Science and Analytics? Sayan Putatunda covers all these questions in this article.
Data Science is the latest buzzword in the industry. Even though initially many dismissed it as a mere fad but now over the years various organizations have realized the potential of data science to generate actionable insights from structured and unstructured data.
From banks to ecommerce firms to even manufacturing industries, all of them have understood the importance of data science and have adopted it in their day to day business activities to improve their performance.
The role of a data scientist has already earned the moniker of “the sexiest job of the 21st century” . According to a report by the Mckinsey Global Institute , there will be a shortage of 140,000 to 190,000 data science professionals by 2018 in USA alone.
As far as India is concerned, there are some studies that believe that the Analytics/ Data Science industry in India is at a phase where IT was some 10-15 years ago and thus a boom can be expected in analytics outsourcing to India .
I also believe that India with its data science/ analytics talent pool can very well be the leader in this industry. Already there are some success stories such as Mu Sigma and Fractal analytics. Moreover, we are now officially living in the age of the “Big Data”.
So it`s quite clear that why the data scientists are in demand and also a lot of new jobs would be created in this field in the near future. Hence, data science can be seen as a lucrative career option.
Data Science is an amalgamation of proper business understanding, mathematics, statistics, programming and communication skills. Thus one is expected to display all the above-mentioned skills in the role of a data scientist.
It is expected that a data scientist understands the business problem, builds a hypothesis, understands the kind of data required, performs data cleaning and preliminary data analysis, builds statistical models to give solution and finally effectively communicate the insights to the client. Thus, the job of a data scientist encompasses various roles and functions.
Now to ensure that your resume grabs eyeballs when you apply to an analytics firm needs some preparation. The preparation would be different for a fresher than for someone who already has some work experience under his belt albeit in a different domain.
For a fresher generally engineering or maths/ stats graduates the focus is more on analytical problem solving and exposure to some programming language. And then they can apply to analytics firms either through campus placements or off-campus placement drives and try to ace their interview process.
But for someone with substantial work experience in some other domain say an IT professional, it`s a different story altogether. An IT professional is generally good at programming skills but they fall short by quite some distance when it comes to mathematical intuition or depth in business understanding.
So for an IT professional or in fact professional from any other sector, it`s a bit difficult to make transition into Data science but not impossible. I have successfully made this transition and hence can vouch for this fact.
Analytics or Data Science recruiters are on a lookout for relevant skills and so the trick is to acquire these skills over a period of time & leverage them during an interview. Now let`s discuss the various aspects one needs to work on to make a successful transition to the analytics industry.
This is obviously the traditional route i.e. start with a clean slate. One can enroll themselves into a postgraduate programme in analytics.
For example – IIM Calcutta has started a PGP in business analytics with ISI Kolkata and IIT Kharagpur a couple of years back and this programme is doing well.
There are some very good masters programmes in various US universities as well. For example, North Carolina State University, MIT Sloan, UC Berkeley, Texas A&M, etc.
One can even go for a general MBA degree but take some analytics related electives such as advance data analysis, machine learning, etc.
But then this is something that might not be possible for everyone for various reasons. In that case, one needs to focus on self-learning and effective utilization of freely available learning resources at their disposal. Some of these are discussed below.
It is expected that an aspiring data scientist should have some familiarity with various statistics or machine learning methodologies used in the industry.
One can start from the basics i.e. normal distribution, central limit theorem, hypothesis testing and then move on to advanced techniques viz. linear regression, logistics regression, decision trees, cluster analysis, generalized additive models, etc.
A recommended book for this would be The elements of statistical learning (by Hastie, Tibshirani and Friedman).
Apart from the standard textbooks, an alternative but effective way of learning would be going for MOOCs. There are a lot of free statistics/ data mining courses available via Coursera, edX, MIT open, Stanford online, NPTEL, etc.
As far as the tools in analytics industry is concerned, SAS and SPSS used to be popular before the open source revolution took the industry by storm. Open source tools like R and Python are the next big thing and it would make sense to invest time on them.
There are enough freely available resources on the web to learn both R and Python. For people with coding skills in object oriented languages like Java will find Python intuitive. But R is the best tool (personal opinion) when it comes to statistical modeling and it is also the preferred tool in academia.
For an absolute beginner, the introductory course in R at datacamp.com can be a starting point. But the best way to learn these softwares is by doing. So I would suggest that one should replicate the codes available and test it on some dummy datasets to understand what`s going on.
Also, a working knowledge of SQL along with advanced MS Excel / VBA skills can act as a differentiator when one appears for their interview.
Since data science is not only about technical mumbo jumbo so it would be really be helpful if one understands the business applications of it and one is also aware of various successful use cases.
This will help one see the bigger picture and also make one well equipped to understand what kind of methodology fits for a particular business problem.
For example, how market basket analysis is used for product bundling by retailers, how cluster analysis can be used for customer segmentation for a new product launch, how logistic regression can be used for fraud detection in banking/ insurance sector, etc.
The last but not the least would be – practice, practice and practice. One way to do it would be by participating in various data science competitions hosted in sites like kaggle.com. Even analyticsvidhya.com hosts data science competitions.
But I would suggest to go through some of the past competitions at kaggle and replicate some of the scripts to understand the modus-operandi. The level of competitions at kaggle is high and one can learn how to handle challenging datasets and come up with a solutions.
Also, the discussion on the forums with like-minded data science enthusiasts can be helpful.
Finally, even after one has got a break in the data science industry, one needs to guard against complacency. The way technology is progressing and the analytics field is developing, there is something new to learn everyday!
– Best online courses in data science
 Thoma H. Davenport and D.J. Patil, “Data Scientist: The Sexiest Job of the 21st Century,” 2012. [Online]. Available: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
 James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela Hung Byers, “Big data: The next frontier for innovation,competition, and productivity,” 2011. [Online]. Available: http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation
 Bhasker Gupta, “Analytics Outsourcing to India: Should or Shouldn’t,” 2015. [Online]. Available: http://www.kdnuggets.com/2015/02/analytics-outsourcing-india.html
About the author: Sayan Putatunda is currently a doctoral student at the Indian Institute of Management, Ahmedabad (IIMA) who is passionate about Data Science and Machine Learning. He is also a national level Powerlifter, a Marathon runner and an avid Reader.