In the last couple of years, we’ve seen the rapid adoption of machine learning into the analytics environment, moving from science experiment to table stakes. In fact, at this point, I’m hard pressed to think of an enterprise that doesn’t have at least some sort of predictive or machine learning strategy already in place.
Meanwhile, data warehouses have long been the foundation of analytics and business intelligence––but they’ve also traditionally been complex and expensive to operate. With the widespread adoption of machine learning and the increasing need to broaden access to data beyond just data science teams, we are seeing a fundamental shift in the way organizations should approach data warehousing.
With this in mind, here are three broad data management trends I expect will accelerate this year:
Operationalize insights with analytical databases
I’m seeing a lot of convergence between machine learning and analytics. As a result, people are using machine learning frameworks such as R, Python, and Spark to do their machine learning.
They also then do their best to make those results available in ways that are accessible to the rest of the business beyond only data scientists. These talented data scientists are hacking away using their own tools but these are just not going to be accessed by business analysts.
How you get the best of both worlds is to allow data scientists to use their tools of choice to produce their predictions, but then publish those results to an analytical database, which is more open to business users. The business user is already familiar with tools like Tableau, so by using an analytical database they can easily operationalize insights from the predictive model outcomes.
Growth in streaming data sources
Similar to the convergence of machine learning and analytics, I’m also seeing much greater interest in how to support streaming use cases or streaming data sources.
There are a number of technologies, among them Kafka, that provide a way to capture and propagate streams and do stream-based processing. Many systems from web analytics stacks to a single microservice in someone’s application stack are pushing out interesting events to a Kafka topic. But how do you consume that?
There are specialized streaming databases, for example, that allow you to consume this in real time. In some cases that works well but in others it's not as natural, especially when trending across larger data ranges. Accomplishing this is easier by pushing that streaming data into an analytics database.
The ephemeral data mart
The third trend I’m seeing more of, and I expect to accelerate in 2018, is what I would call the ephemeral data mart.
What I mean by that is to quickly bring together a data set, perform some queries, and then the data can be thrown away. As such, data resiliency and high availability become less important than data ingestion and computation speed. I’m seeing this in some of our customers and expect to see more.
One customer in particular is using an analytics database to do processing of very large test results. By creating an ephemeral data mart for each test run, they can perform post-test analysis and trending, then just store the results for the longer term.
As organizations need better and more timely analytics that fit within their hardware and cost budgets, it’s changing the ways data is accessed and stored. The trends I’ve outlined above are ones that I expect to gather steam this year, and can serve as guideposts for enterprises that recognize the need to modernize their approach to data warehouses.
Author: Dave Thompson
Source: Information Management