Will Machine Learning Replace Data Scientists?

data science

People have begun to get jumpy at the possibility of Artificial Intelligence being used to automate anything and everything. Now that AI has proven it has the propensity to push out blue-collar jobs (via robotics) and white collar professions (via Natural Language Processing), cultural susceptivity surrounding this technology is on the rise. After decades of exploring symbolic AI methods, the field has transposed towards statistical approaches, that have recently begun working in a vast array of ways, largely due to the wave of data and computing power. This has inadvertently lead to the rise of machine learning.

In todays digital world, machine learning and big data analytics have become staples in business, and are increasingly being incorporated into business strategies by organizations. The ‘data-driven enterprise’ makes all it’s decisions based on the insights they get from collected data. However, as A.I and machine learning continue to develop a larger role in the enterprise, there is a lot of talk about the role of the data scientists becoming antiquated. The advances made in machine learning by industry titans like Microsoft and Google evinces that most of the work currently being handled by data scientists will be automated in the near future. Gartner also recently reported that 40 percent of data science tasks will be automated by 2020.

The difference between Machine Learning and Data Science

Data science is primarily a concept used to tackle big data and is inclusive of data preparation, cleansing and analysis. The rise of big data sparked the rise of data science to support the need for businesses to gain insights from their massive unstructured data sets. While the typical data scientist is envisioned as a programmer experienced in Hadoop, SQL, Python, R and statistics, this is just the tip of the data science iceberg. Essentially, data scientists are tasked with solving real company problems by analyzing them and developing data driven answers, how they do it is irrelevant. The Journal of data science describes it as “almost everything that has something to do with data … yet the most important part is its applications – all sorts of applications”. One of the applications being machine learning.

The rise of big data has also made it possible to train machines with a data driven approach as opposed to a knowledge driven approach. Theoretical research relating to recurring neural networks has become feasible; transitioning deep learning from an academic concept to a tangible, useful class of machine learning that is affecting out every day lives. Machine learning and A.I has now dominated the media, overshadowing every other aspects of data science. So now the prevalent view of a data scientist is a researcher focused on machine learning and A.I. In real sense data science transcends machine learning.

Machine learning is basically a set of algorithms that train on a set of data to fine tune their parameters. Obtaining training data is reliant on multiple data science techniques like supervised clustering and regression. On the other hand, ‘data’ in data science may or may not evolve from a mechanical process or a machine. The main difference between the two is that data science covers the entire spectrum of data processing, not just the statistical or algorithmic aspects.

Human Intuition Cannot Be Automated

Data science is distinguishable from machine learning due to the fact that its goal is especially human focused – to gain insight and understanding. There always has to be a human in the loop. Data scientists utilize a combination of engineering, statistics and human expertise to understand data from a business point of view and provide accurate insights and predictions. While ML algorithms can help identify organizational trends, their role in a data driven process is limited to making predictions about future outcomes. They are not yet fully capable of understanding what specific data means for an enterprise and its relationships, or even the relationships between varying, unconnected operations.

The judgment and critical thinking of a data scientist is indispensable in monitoring the parameters and making sure that the customized needs of a business are met. Once all the questions have been asked, data has been gathered, and ran through necessary algorithms. A discerning data scientist will have to figure out what the larger business implications are and present takeaways to management. Ultimately, the interactive interpersonal conversations driving these initiatives is fueled by abstract, creative thinking that cannot be replaced by any modern-day machine.

Advances in AI Is driving Talent Demand

As the transformational AI wave cuts across end-markets from enterprise to consumer platforms, from robotics to cyber security, the demand for data scientists is only likely to grow. The role of a data scientist will probably assume a new level of importance and evolve in typical computer science fashion. As the machines ability to accurately analyze data increases, with the help from expert statistical modeling and solid algorithms created by data scientists. Data scientists will move up the ‘abstraction scale’ and begin tackling higher level and more complex tasks. The current demand clearly outpaces the supply. McKinsey Global Institute estimates that the United States could have about 250,000 open data science positions by 2024. This data science skill gap is likely to leave companies scrambling to hire candidates who can meet their analytical needs.

In Closing

Will machine learning replace data scientists? The short answer is no, or at least not yet. Certain aspects of low-level data science can and should be automated. However, machine learning is creating a real need for data scientists. As AI advances to analyze and establish cause as well as correlations, software will be used in collect and analyze data; but ML tools don’t yet possess the human curiosity or desire to create and validate experiments. That aspect of data science will probably never be automated any time soon. Human intelligence is crucial to the data science field, despite the fact that machine learning can help, it can’t completely take over.


Author – Gabriel Lando