Data Science RPA

Why data is key in driving Robotic Process Automation

Enterprise technology innovation and investments are typically driven by compelling events in an organization, especially in the areas of computing and machinery, as companies seek to do business faster and remain competitive. For the past several decades, the theme of transferring mundane, repetitive, error-prone, and painful tasks over to machines has largely comprised enterprise innovations. Businesses will typically install and configure these new technologies, followed by engaging a team of experts to monitor the outputs and make appropriate tweaks to react to changes in the business. The next phase of this continuum is Robotic Process Automation (RPA), removing the continuous improvement responsibility from experts and trained technicians and transferring it into the machines themselves. To realize the promise of RPA, a solid data foundation built on high-quality, relevant data is absolutely necessary.

Data driving RPA

As with any machine learning-based solution, the quality of results is directly related to the training sets and processes used to train and tune the algorithms. The 'garbage-in-garbage-out' principle is certainly at play, but in reality, the data input strategy for RPA solutions is much more nuanced. Depending on the types of problems that RPA is attempting to solve, data sets that contain “bad” data may be needed as a core or large part of the training sets for the models and inputs. If a company wanted to implement RPA to help automate testing of changes to their ERP landscape for instance, training sets and tuning processes would need to be established that contained both traditionally 'good' data as well as data that is specifically used to drive exception and failure testing. Real care and curation will ensure the right training paths and data sets are established to drive RPA in the right direction.

There is a common misconception around RPA solutions that they can operate in a 'set it and forget it' mode. In reality, the implementation of RPA solutions mirrors the implementation of any modern technology system. An RPA solution is defined by its specification and requirements, which the implementation team takes forward to configure the solution to meet implementation needs. A nuanced difference in RPA implementations is the importance of establishing fit-for-purpose data early in the implementation phases. For example, when a new business process system such as CRM is first put into place, sample data can be used to help drive the implementation and test different features and functional flows. With RPA solutions however, sample data could run the risk of driving bias or misdirected learnings into the solution that would take more time and effort to correct. The cost of an issue increases the further away the issue is located from its source. When an issue with an RPA solution is found late during implementation, it can mean revising many algorithms with large data sets is required, which can translate to increased costs and decreased satisfaction with the solution. Confirming fit-for-purpose data upfront is key for many RPA implementation projects to be successful.

Data generation through RPA

Beyond the initial data that goes into RPA solutions, the very nature of RPA is that more data will be generated faster by organizations that take advantage of RPA. With machines running regularly over many iterations and permutations, and with each iteration generating more data, the speed and volumes of data from business processes executed via RPA will inevitably be higher than processes executed by a combination of people and technology today. This increase in data can be a boon both to an organization’s analytical planning and teams as well as for the data packaging and potential valuation of the data. Data scientists and others performing analytical roles will have more inputs to determine areas of investment and innovation based on analysis of the output of RPA data. A general rule is: the larger the data sets, the more statistically trusted the analytical insights, and RPA will generate large data sets for analysis.

Any uptick in the flow, volume, velocity, or overall size of data will also increase the need to properly manage and understand that data. The most common location for analytical data coming from RPA implementations to be stored is in a data lake. By including the necessary data governance cycles into RPA implementations, especially around data lake setup and Data Science access, the downstream benefits of RPA can be accelerated. Working through a clear understanding of what the RPA data means, who owns it, where it lives, and how it can be used in an appropriate way will help get both buy-in from those across the organization as well as the backing of what is today a typically well-funded and influential part of the business: the data science and analytics teams.

RPA solutions offer the promise of innovation, acceleration of results, and lessening the mundane for all involved. By starting off with fit-for-purpose data sets to train RPA implementations and a strategically aligned plan for managing and leveraging the data being produced by the RPA solution, organizations will be able to take RPA from hype to helpful by using it as a meaningful part of their digital transformation journey.

Author: Tyler Warden

Source: Dataversity