How the skillset of data scientists will change over the next decade
AutoML is poised to turn developers into data scientists — and vice versa. Here’s how AutoML will radically change data science for the better.
In the coming decade, the data scientist role as we know it will look very different than it does today. But don’t worry, no one is predicting lost jobs, just changed jobs.
Data scientists will be fine — according to the Bureau of Labor Statistics, the role is still projected to grow at a higher than average clip through 2029. But advancements in technology will be the impetus for a huge shift in a data scientist’s responsibilities and in the way businesses approach analytics as a whole. And AutoML tools, which help automate the machine learning pipeline from raw data to a usable model, will lead this revolution.
In 10 years, data scientists will have entirely different sets of skills and tools, but their function will remain the same: to serve as confident and competent technology guides that can make sense of complex data to solve business problems.
AutoML democratizes data science
Until recently, machine learning algorithms and processes were almost exclusively the domain of more traditional data science roles—those with formal education and advanced degrees, or working for large technology corporations. Data scientists have played an invaluable role in every part of the machine learning development spectrum. But in time, their role will become more collaborative and strategic. With tools like AutoML to automate some of their more academic skills, data scientists can focus on guiding organizations toward solutions to business problems via data.
In many ways, this is because AutoML democratizes the effort of putting machine learning into practice. Vendors from startups to cloud hyperscalers have launched solutions easy enough for developers to use and experiment on without a large educational or experiential barrier to entry. Similarly, some AutoML applications are intuitive and simple enough that non-technical workers can try their hands at creating solutions to problems in their own departments—creating a “citizen data scientist” of sorts within organizations.
In order to explore the possibilities these types of tools unlock for both developers and data scientists, we first have to understand the current state of data science as it relates to machine learning development. It’s easiest to understand when placed on a maturity scale.
Smaller organizations and businesses with more traditional roles in charge of digital transformation (i.e., not classically trained data scientists) typically fall on this end of this scale. Right now, they are the biggest customers for out-of-the-box machine learning applications, which are more geared toward an audience unfamiliar with the intricacies of machine learning.
- Pros: These turnkey applications tend to be easy to implement, and relatively cheap and easy to deploy. For smaller companies with a very specific process to automate or improve, there are likely several viable options on the market. The low barrier to entry makes these applications perfect for data scientists wading into machine learning for the first time. Because some of the applications are so intuitive, they even allow non-technical employees a chance to experiment with automation and advanced data capabilities—potentially introducing a valuable sandbox into an organization.
- Cons: This class of machine learning applications is notoriously inflexible. While they can be easy to implement, they aren’t easily customized. As such, certain levels of accuracy may be impossible for certain applications. Additionally, these applications can be severely limited by their reliance on pretrained models and data.
Examples of these applications include Amazon Comprehend, Amazon Lex, and Amazon Forecast from Amazon Web Services and Azure Speech Services and Azure Language Understanding (LUIS) from Microsoft Azure. These tools are often sufficient enough for burgeoning data scientists to take the first steps in machine learning and usher their organizations further down the maturity spectrum.
Customizable solutions with AutoML
Organizations with large yet relatively common data sets—think customer transaction data or marketing email metrics—need more flexibility when using machine learning to solve problems. Enter AutoML. AutoML takes the steps of a manual machine learning workflow (data discovery, exploratory data analysis, hyperparameter tuning, etc.) and condenses them into a configurable stack.
- Pros: AutoML applications allow more experiments to be run on data in a larger space. But the real superpower of AutoML is the accessibility — custom configurations can be built and inputs can be refined relatively easily. What’s more, AutoML isn’t made exclusively with data scientists as an audience. Developers can also easily tinker within the sandbox to bring machine learning elements into their own products or projects.
- Cons: While it comes close, AutoML’s limitations mean accuracy in outputs will be difficult to perfect. Because of this, degree-holding, card carrying data scientists often look down upon applications built with the help of AutoML — even if the result is accurate enough to solve the problem at hand.
Examples of these applications include Amazon SageMaker AutoPilot or Google Cloud AutoML. Data scientists a decade from now will undoubtedly need to be familiar with tools like these. Like a developer who is proficient in multiple programming languages, data scientists will need to have proficiency with multiple AutoML environments in order to be considered top talent.
“Hand-rolled” and homegrown machine learning solutions
The largest enterprise-scale businesses and Fortune 500 companies are where most of the advanced and proprietary machine learning applications are currently being developed. Data scientists at these organizations are part of large teams perfecting machine learning algorithms using troves of historical company data, and building these applications from the ground up. Custom applications like these are only possible with considerable resources and talent, which is why the payoff and risks are so great.
- Pros: Like any application built from scratch, custom machine learning is “state-of-the-art” and is built based on a deep understanding of the problem at hand. It’s also more accurate — if only by small margins — than AutoML and out-of-the-box machine learning solutions.
- Cons: Getting a custom machine learning application to reach certain accuracy thresholds can be extremely difficult, and often requires heavy lifting by teams of data scientists. Additionally, custom machine learning options are the most time-consuming and most expensive to develop.
An example of a hand-rolled machine learning solution is starting with a blank Jupyter notebook, manually importing data, and then conducting each step from exploratory data analysis through model tuning by hand. This is often achieved by writing custom code using open source machine learning frameworks such as Scikit-learn, TensorFlow, PyTorch, and many others. This approach requires a high degree of both experience and intuition, but can produce results that often outperform both turnkey machine learning services and AutoML.
Tools like AutoML will shift data science roles and responsibilities over the next 10 years. AutoML takes the burden of developing machine learning from scratch off of data scientists, and instead puts the possibilities of machine learning technology directly in the hands of other problem solvers. With time freed up to focus on what they know—the data and the inputs themselves — data scientists a decade from now will serve as even more valuable guides for their organizations.
Author: Eric Miller