20 items tagged "Data quality"

  • 9 Data issues to deal with in order to optimize AI projects

    9 Data issues to deal with in order to optimize AI projects

    The quality of your data affects how well your AI and machine learning models will operate. Getting ahead of these nine data issues will poise organizations for successful AI models.

    At the core of modern AI projects are machine-learning-based systems which depend on data to derive their predictive power. Because of this, all artificial intelligence projects are dependent on high data quality.

    However, obtaining and maintaining high quality data is not always easy. There are numerous data quality issues that threaten to derail your AI and machine learning projects. In particular, these nine data quality issues need to be considered and prevented before issues arise.

    1. Inaccurate, incomplete and improperly labeled data

    Inaccurate, incomplete or improperly labeled data is typically the cause of AI project failure. These data issues can range from bad data at the source to data that has not been cleaned or prepared properly. Data might be in the incorrect fields or have the wrong labels applied.

    Data cleanliness is such an issue that an entire industry of data preparation has emerged to address it. While it might seem an easy task to clean gigabytes of data, imagine having petabytes or zettabytes of data to clean. Traditional approaches simply don't scale, which has resulted in new AI-powered tools to help spot and clean data issues.

    2. Having too much data

    Since data is important to AI projects, it's a common thought that the more data you have, the better. However, when using machine learning sometimes throwing too much data at a model doesn't actually help. Therefore, a counterintuitive issue around data quality is actually having too much data.

    While it might seem like too much data can never be a bad thing, more often than not, a good portion of the data is not usable or relevant. Having to go through to separate useful data from this large data set wastes organizational resources. In addition, all that extra data might result in data "noise" that can result in machine learning systems learning from the nuances and variances in the data rather than the more significant overall trend.

    3. Having too little data

    On the flip side, having too little data presents its own problems. While training a model on a small data set may produce acceptable results in a test environment, bringing this model from proof of concept or pilot stage into production typically requires more data. In general, small data sets can produce results that have low complexity, are biased or too overfitted and will not be accurate when working with new data.

    4. Biased data

    In addition to incorrect data, another issue is that the data might be biased. The data might be selected from larger data sets in ways that doesn't appropriately convey the message of the wider data set. In other ways, data might be derived from older information that might have been the result of human bias. Or perhaps there are some issues with the way that data is collected or generated that results in a final biased outcome.

    5. Unbalanced data

    While everyone wants to try to minimize or eliminate bias from their data sets, this is much easier said than done. There are several factors that can come into play when addressing biased data. One factor can be unbalanced data. Unbalanced data sets can significantly hinder the performance of machine learning models. Unbalanced data has an overrepresentation of data from one community or group while unnecessarily reducing the representation of another group.

    An example of an unbalanced data set can be found in some approaches to fraud detection. In general, most transactions are not fraudulent, which means that only a very small portion of your data set will be fraudulent transactions. Since a model trained on this fraudulent data can receive significantly more examples from one class versus another, the results will be biased towards the class with more examples. That's why it's essential to conduct thorough exploratory data analysis to discover such issues early and consider solutions that can help balance data sets.

    6. Data silos

    Related to the issue of unbalanced data is the issue of data silos. A data silo is where only a certain group or limited number of individuals at an organization have access to a data set. Data silos can result from several factors, including technical challenges or restrictions in integrating data sets as well as issues with proprietary or security access control of data.

    They are also the product of structural breakdowns at organizations where only certain groups have access to certain data as well as cultural issues where lack of collaboration between departments prevents data sharing. Regardless of the reason, data silos can limit the ability of those at a company working on artificial intelligence projects to gain access to comprehensive data sets, possibly lowering quality results.

    7. Inconsistent data

    Not all data is created the same. Just because you're collecting information, that doesn't mean that it can or should always be used. Related to the collection of too much data is the challenge of collecting irrelevant data to be used for training. Training the model on clean, but irrelevant data results in the same issues as training systems on poor quality data.

    In conjunction with the concept of data irrelevancy is inconsistent data. In many circumstances, the same records might exist multiple times in different data sets but with different values, resulting in inconsistencies. Duplicate data is one of the biggest problems for data-driven businesses. When dealing with multiple data sources, inconsistency is a big indicator of a data quality problem.

    8. Data sparsity

    Another issue is data sparsity. Data sparsity is when there is missing data or when there is an insufficient quantity of specific expected values in a data set. Data sparsity can change the performance of machine learning algorithms and their ability to calculate accurate predictions. If data sparsity is not identified, it can result in models being trained on noisy or insufficient data, reducing the effectiveness or accuracy of results.

    9. Data labeling issues

    Supervised machine learning models, one of the fundamental types of machine learning, require data to be labeled with correct metadata for machines to be able to derive insights. Data labeling is a hard task, often requiring human resources to put metadata on a wide range of data types. This can be both complex and expensive. One of the biggest data quality issues currently challenging in-house AI projects is the lack of proper labeling of machine learning training data. Accurately labeled data ensures that machine learning systems establish reliable models for pattern recognition, forming the foundations of every AI project. Good quality labeled data is paramount to accurately training the AI system on what data it is being fed.

    Organizations looking to implement successful AI projects need to pay attention to the quality of their data. While reasons for data quality issues are many, a common theme that companies need to remember is that in order to have data in the best condition possible, proper management is key. It's important to keep a watchful eye on the data that is being collected, run regular checks on this data, keep the data as accurate as possible, and get the data in the right format before having machine learning models learn on this data. If companies are able to stay on top of their data, quality issues are less likely to arise.

    Author: Kathleen Walch

    Source: TechTarget

  • AI-Powered Data Integration: A New Era of Efficiency and Intelligence

    AI-Powered Data Integration: A New Era of Efficiency and Intelligence

    Enterprises are creating and collecting more data than ever, around 2.5 quintillion bytes per day, which will likely continue in the coming years. Businesses are thus constantly looking for solutions that can efficiently collect and combine this data.  

    One of the best solutions these days to solve data integration woes is Artificial Intelligence (AI). Many businesses are increasingly adopting AI to rapidly evolve their data processes as they strive to streamline operations, improve decision-making, and gain a competitive edge.  

    AI is helping companies improve productivity and cut costs while allowing employees to deliver more value. AI is not just a short-term trend that is going to fade away. In fact, it will become prominent as technology improves and business requirements become more intricate.  

    Let’s look at the benefits of using AI to power data integration efforts and what the future holds.  

    Intelligent Data Mapping and Transformation 

    Data mapping is the critical component of data integration, which defines relationships between objects in different databases. AI has completely changed data mapping by making it more efficient and smarter. AI-powered data mapping can easily overcome the complexities of diverse data formats and systems, ensuring seamless data flow and harmonization. 

    Machine learning algorithms can analyze data patterns, learn from past integration patterns, and suggest mappings and transformations, reducing manual effort and accelerating integration projects and, consequently, time-to-insight.  

    AI can also automatically suggest relevant transmutations based on the nature of the data and past inputs, speeding up data processing. The best part about using AI is perhaps that it can automatically build ingestion pipelines from multiple sources within an enterprise, enabling a business to create a single source of truth.  

    Boosting Data Quality 

    It is cheaper to solve data quality issues proactively than reactively, not to mention quicker. AI plays a crucial role in accelerating data quality management during integration. AI tools allow businesses to identify and resolve data inconsistencies during run-time, as opposed to after the data is loaded and processed, thus ensuring the integrity and accuracy of integrated data for analysis. 

    These tools can automatically detect and rectify errors a human analyst might have missed (especially for vast datasets). For example, they can capture and remove outliers in a sales dataset to give a realistic average of monthly sales. In fraud detection, real-time integration with AI algorithms can flag suspicious activities, trigger alerts, and facilitate proactive measures to mitigate fraud risks. Basically, AI allows teams to scale their data initiatives while ensuring accuracy and completeness.  

    Real-time Integration and Workflow Automation 

    With AI, data integration transcends traditional processing. AI algorithms enable real-time data integration by continuously monitoring data streams and integrating data as it becomes available. This approach allows organizations to react swiftly to critical events like market fluctuations, customer behaviors, or operational changes. For example, real-time integration enables an e-commerce business to instantly update inventory levels across multiple channels, ensuring accurate stock availability and minimizing the risk of overselling.  

    Real-time integration is also helpful in situations with multiple connected devices and sources, such as an Internet of Things (IoT) ecosystem. It enables immediate detection and prompt fixing in case of device failures in home systems, for instance.  

    AI-driven solutions automate complex integration processes by automatically identifying data relationships, validating data integrity, and transforming data into the desired format. This automation is necessary in this fast-paced business environment as it minimizes errors, accelerates integration timelines, and frees up resources for more strategic tasks. 

    Future Outlook 

    The use of AI to power various data management processes, including data integration, will become more common. With time, AI solutions will become more adept at detecting and solving anomalies, further reducing the need for manual intervention. The demand for dedicated ETL and ELT developers will gradually decrease as AI empowers non-technical users to oversee the integration process.  

    Currently, many DI tools are limited by the number of connectors they support. As AI tech becomes more robust, it will allow data management providers to build solutions that support a more comprehensive range of sources.  

    Cognitive automation, driven by AI, will lead to more intelligent and autonomous data integration workflows. AI algorithms will optimize integration tasks, prioritize data processing based on relevance and urgency, and proactively identify data quality issues. This level of automation will result in more efficient data integration processes. 

    Lastly, the future holds great promise for specialized AI and ML engineers. The rise of AI will require trained professionals to implement and monitor advanced machine learning algorithms. Consequently, there will be a surge in the demand for relevant trainings and certifications. 

    Final Thoughts 

    There is no denying the fact that AI is the future. AI adoption has become necessary, given the speed at which the world is moving today. It is rapidly reshaping how organizations handle their processes, and data integration is no different. AI’s ability to automate tasks and improve data quality is the key to gaining real-time insights-the key to all competitive advantage.

    Date: August 2, 2023

    Author: Tehreem Naeem

    Source: Datafloq

  • An overview of Morgan Stanley's surge toward data quality

    An overview of Morgan Stanley's surge toward data quality

    Jeff McMillan, chief analytics and data officer at Morgan Stanley, has long worried about the risks of relying solely on data. If the data put into an institution's system is inaccurate or out of date, it will give customers the wrong advice. At a firm like Morgan Stanley, that just isn't an option.

    As a result, Morgan Stanley has been overhauling its approach to data. Chief among them is that it wants to improve data quality in core business processing.

    “The acceleration of data volume and the opportunity this data presents for efficiency and product innovation is expanding dramatically,” said Gerard Hester, head of the bank’s data center of excellence. “We want to be sure we are ahead of the game.”

    The data center of excellence was established in 2018. Hester describes it as a hub with spokes out to all parts of the organization, including equities, fixed income, research, banking, investment management, wealth management, legal, compliance, risk, finance and operations. Each division has its own data requirements.

    “Being able to pull all this data together across the firm we think will help Morgan Stanley’s franchise internally as well as the product we can offer to our clients,” Hester said.

    The firm hopes that improved data quality will let the bank build higher quality artificial intelligence and machine learning tools to deliver insights and guide business decisions. One product expected to benefit from this is the 'next best action' the bank developed for its financial advisers.

    This next best action uses machine learning and predictive analytics to analyze research reports and market data, identify investment possibilities, and match them to individual clients’ preferences. Financial advisers can choose to use the next best action’s suggestions or not.

    Another tool that could benefit from better data is an internal virtual assistant called 'ask research'. Ask research provides quick answers to routine questions like, “What’s Google’s earnings per share?” or “Send me your latest model for Google.” This technology is currently being tested in several departments, including wealth management.

    New data strategy

    Better data quality is just one of the goals of the revamp. Another is to have tighter control and oversight over where and how data is being used, and to ensure the right data is being used to deliver new products to clients.

    To make this happen, the bank recently created a new data strategy with three pillar. The first is working with each business area to understand their data issues and begin to address those issues.

    “We have made significant progress in the last nine months working with a number of our businesses, specifically our equities business,” Hester said.

    The second pillar is tools and innovation that improve data access and security. The third pillar is an identity framework.

    At the end of February, the bank hired Liezel McCord to oversee data policy within the new strategy. Until recently, McCord was an external consultant helping Morgan Stanley with its Brexit strategy. One of McCord’s responsibilities will be to improve data ownership, to hold data owners accountable when the data they create is wrong and to give them credit when it’s right.

    “It’s incredibly important that we have clear ownership of the data,” Hester said. “Imagine you’re joining lots of pieces of data. If the quality isn’t high for one of those sources of data, that could undermine the work you’re trying to do.”

    Data owners will be held accountable for the accuracy, security and quality of the data they contribute and make sure that any issues are addressed.

    Trend of data quality projects

    Arindam Choudhury, the banking and capital markets leader at Capgemini, said many banks are refocusing on data as it gets distributed in new applications.

    Some are driven by regulatory concerns, he said. For example, the Basel Committee on Banking Supervision's standard number 239 (principles for effective risk data aggregation and risk reporting) is pushing some institutions to make data management changes.

    “In the first go-round, people complied with it, but as point-to-point interfaces and applications, which was not very cost effective,” Choudhury said. “So now people are looking at moving to the cloud or a data lake, they’re looking at a more rationalized way and a more cost-effective way of implementing those principles.”

    Another trend pushing banks to get their data house in order is competition from fintechs.

    “One challenge that almost every financial services organization has today is they’re being disintermediated by a lot of the fintechs, so they’re looking at assets that can be used to either partner with these fintechs or protect or even grow their business,” Choudhury said. “So they’re taking a closer look at the data access they have. Organizations are starting to look at data as a strategic asset and try to find ways to monetize it.”

    A third driver is the desire for better analytics and reports.

    "There’s a strong trend toward centralizing and figuring out, where does this data come from, what is the provenance of this data, who touched it, what kinds of rules did we apply to it?” Choudhury said. That, he said, could lead to explainable, valid and trustworthy AI.

    Author: Penny Crosman

    Source: Information-management

  • Business Intelligence Trends for 2017

    businessintelligence 5829945be5abcAnalyst and consulting firm, Business Application Research Centre (BARC), has come out with the top BI trends based on a survey carried out on 2800 BI professionals. Compared to last year, there were no significant changes in the ranking of the importance of BI trends, indicating that no major market shifts or disruptions are expected to impact this sector.
     
    With the growing advancement and disruptions in IT, the eight meta trends that influence and affect the strategies, investments and operations of enterprises, worldwide, are Digitalization, Consumerization, Agility, Security, Analytics, Cloud, Mobile and Artificial Intelligence. All these meta trends are major drivers for the growing demand for data management, business intelligence and analytics (BI). Their growth would also specify the trend for this industry.The top three trends out of 21 trends for 2017 were:
    • Data discovery and visualization,
    • Self-service BI and
    • Data quality and master data management
    • Data labs and data science, cloud BI and data as a product were the least important trends for 2017.
    Data discovery and visualization, along with predictive analytics, are some of the most desired BI functions that users want in a self-service mode. But the report suggested that organizations should also have an underlying tool and data governance framework to ensure control over data.
     
    In 2016, BI was majorly used in the finance department followed by management and sales and there was a very slight variation in their usage rates in that last 3 years. But, there was a surge in BI usage in production and operations departments which grew from 20% in 2008 to 53% in 2016.
     
    "While BI has always been strong in sales and finance, production and operations departments have traditionally been more cautious about adopting it,” says Carsten Bange, CEO of BARC. “But with the general trend for using data to support decision-making, this has all changed. Technology for areas such as event processing and real-time data integration and visualization has become more widely available in recent years. Also, the wave of big data from the Internet of Things and the Industrial Internet has increased awareness and demand for analytics, and will likely continue to drive further BI usage in production and operations."
     
    Customer analysis was the #1 investment area for new BI projects with 40% respondents investing their BI budgets on customer behavior analysis and 32% on developing a unified view of customers.
    • “With areas such as accounting and finance more or less under control, companies are moving to other areas of the enterprise, in particular to gain a better understanding of customer, market and competitive dynamics,” said Carsten Bange.
    • Many BI trends in the past, have become critical BI components in the present.
    • Many organizations were also considering trends like collaboration and sensor data analysis as critical BI components. About 20% respondents were already using BI trends like collaboration and spatial/location analysis.
    • About 12% were using cloud BI and more were planning to employ it in the future. IBM's Watson and Salesforce's Einstein are gearing to meet this growth.
    • Only 10% of the respondents used social media analysis.
    • Sensor data analysis is also growing driven by the huge volumes of data generated by the millions of IoT devices being used by telecom, utilities and transportation industries. According to the survey, in 2017, the transport and telecoms industries would lead the leveraging of sensor data.
    The biggest new investments in BI are planned in the manufacturing and utilities industries in 2017.
     
    Source: readitquick.com, November 14, 2016
  • Changing voluntarily and the role of data quality

    Changing voluntarily and the role of data quality

    In the modern world nothing stays the same for long. We live in a state of constant change with new technologies, new trends and new risks. Yet it’s a commonly held belief that people don’t like change. Which led me to wonder, why do we persist in calling change management initiatives 'change management' if people don’t like change.

    In my experience I have not found this maxim to be true. Actually, nobody minds change, we evolve and adapt naturally but what we do not like is being forced to change. As such, when we make a choice to change, it is often easy, fast and permanent.

    To put that into context, change is an external force imposed upon you. For example, if I tell you I want you to change your attitude, you are expected to adapt your patterns of behaviour to comply with my idea of your ‘new and improved attitude’. This is difficult to maintain and conflicts with your innate human need to exercise your own free-will. However, if I ask you to choose your attitude, this places you in control of your own patterns of behaviour. You can assess the situation and decide the appropriate attitude you will adopt. This makes it far more likely that you will maintain the changes and, as a result, will reap the rewards.

    Perhaps you’re wondering what this has to do with the data quality and data quality management of your organisation?

    Quite simply, the need for choice applies to every aspect of life. Making positive choices for our health and wellbeing, choosing to make change that improves our environmental impact and making changes that will positively impact the financial, reputational and commercial wellbeing of your business, one of which is data quality management. The ultimate success of these initiatives stem from one thing: the conscious choice to change.

    It’s a simple case of cause and effect.

    So back to my original point of choice management, not change management.
    An organisational choice owned and performed by everyone, to improve your data quality and data cleansing, driven by a thorough understanding of the beneficial outcomes, will reap untold business rewards. After all, over 2,000 years ago Aristotle gave us a clue by saying “We are what we repeatedly do, therefore excellence is not an act, but a habit.”
    When you choose to improve and maintain the quality of the baseline data that is relied upon for business decisions:

    • Your business outcomes will improve because you will have a better understanding of your customers’ needs:
    • You will reduce wasted effort by communicating directly to a relevant and engaged audience:
    • Profits will increase as a result of data cleansing and reduced duplication of effort coupled with increased trust in your brand, and
    • Customer, employee and shareholder confidence and satisfaction will rise.

    Bringing your team with you on a journey of change and helping them to make the choices to effectively implement those changes, will require you to travel the ‘Change Curve’ together. As a business leader, you will be at the forefront leading the way and coaching your staff to join you on the journey.

    We can all find ourselves at the start of the change curve at times, in denial of the need or issues you know need to be tackled. You, and your team, may feel angry or overwhelmed by the scale of the change that you need to achieve. However, the key is choosing to accept the need to change, adapt and evolve. That way, you will move in your new direction much faster, taking the action to make your goals a reality.

    It’s easy to feel overwhelmed when you feel that you have a mountain to climb and it can be easy to make decisions based on where you are now. However, choosing to make business decisions regarding your data quality and your need for data quality tools, that are based on where you want to be, is where the true power lies and that is where you will unleash your winning formula.

    Author: Martin Doyle

    Source: DQ Global

  • Data accuracy - What is it and why is it important?

    Data accuracy - What is it and why is it important?

    The world has come to rely on data. Data-driven analytics fuel marketing strategies, supply chain operations, and more, and often to impressive results. However, without careful attention to data accuracy, these analytics can steer businesses in the wrong direction.

    Just as data analytics can be detrimental if not executed properly, so too can the misapplication of data analysis lead to unintended consequences. This is especially true when it comes to understanding accuracy in data.

    WHAT IS DATA ACCURACY?

    Data accuracy is, as its sounds, whether or not given values are correct and consistent. The two most important characteristics of this are form and content, and a data set must be correct in both fields to be accurate.

    For example, imagine a database containing information on employees’ birthdays, and one worker’s birthday is January 5th, 1996. U.S. formats would record that as 1/5/1996, but if this employee is European, they may record it as 5/1/1996. This difference could cause the database to incorrectly state that the worker’s birthday is May 1, 1996.

    In this example, while the data’s content was correct, its form wasn’t, so it wasn’t accurate in the end. If information is of any use to a company, it must be accurate in both form and content.

    WHY IS DATA ACCURACY IMPORTANT?

    While the birthday example may not have significant ramifications, data accuracy can have widespread ripple effects. Consider how some hospitals use AI to predict the best treatment course for cancer patients. If the data this AI analyzes isn’t accurate, it won’t produce reliable predictions, potentially leading to minimally effective or even harmful treatments.

    Studies have shown that bad data costs businesses 30% or more of their revenue on average. If companies are making course-changing decisions based on data analytics, their databases must be accurate. As the world comes to rely more heavily on data, this becomes a more pressing concern.

    HOW TO IMPROVE DATA ACCURACY

    Before using data to train an algorithm or fuel business decisions, data scientists must ensure accuracy. Thankfully, organizations can take several steps to improve their data accuracy. Here are five of the most important actions.

    1. GATHER DATA FROM THE RIGHT SOURCES

    One of the best ways to improve data accuracy is to start with higher-quality information. Companies should review their internal and external data sources to ensure what they’re gathering is true to life. That includes making sure sensors are working correctly, collecting large enough datasets, and vetting third-party sources.

    Some third-party data sources track and publish reported errors, which serves as a useful vetting tool. When getting data from these external sources, businesses should always check these reports to gauge their reliability. Similarly, internal error reports can reveal if one data-gathering process may need adjustment.

    2. EASE DATA ENTRY WORKLOADS

    Some data is accurate from the source but becomes inaccurate in the data entry process. Errors in entry and organization can taint good information, so organizations must work to eliminate these mistakes. One of the most significant fixes to this issue is easing the manual data entry workload.

    If data entry workers have too much on their plate, they can become stressed or tired, leading to mistakes. Delegating the workload more evenly across teams, extending deadlines, or automating some processes can help prevent this stress. Mistakes will drop as a result.

    3. REGULATE DATA ACCESSIBILITY

    Another common cause of data inaccuracy is inconsistencies between departments. If people across multiple teams have access to the same datasets, there will likely be discrepancies in their inputs. Differences in formats and standards between departments could result in duplication or inconsistencies.

    Organizations can prevent these errors by regulating who has access to databases. Minimizing database accessibility makes it easier to standardize data entry methods and reduces the likelihood of duplication. This will also make it easier to trace mistakes to their source and improve security.

    4. REVIEW AND CLEAN DATA

    After compiling information into a database, teams must cleanse the data before using it in any analytics process. This will remove any errors that earlier steps didn’t prevent. Generally speaking, the data cleansing workflow should follow four basic steps: inspection, cleaning, verifying, and reporting.

    In short, that means looking for errors, fixing or removing them (including standardizing formats), double-checking to verify the accuracy, and recording any changes made. That final step is easy to overlook but crucial, as it can reveal any error trends that emerge between data sets.

    5. START SMALL

    While applying these fixes across an entire organization simultaneously may be tempting, that’s not feasible. Instead, teams should work on the accuracy of one database or operation at a time, starting with the most mission-critical data.

    As teams slowly refine their databases, they’ll learn which fixes have the most significant impact and how to implement them efficiently. This gradual approach will maximize these improvements’ efficacy and minimize disruptions.

    DATA ACCURACY IS ESSENTIAL FOR EFFECTIVE ANALYTICS

    Poor-quality data will lead to unreliable and possibly harmful outcomes. Data teams must pay attention to data accuracy if they hope to produce any meaningful results for their company.

    These five steps provide an outline for improving any data operation’s accuracy. With these fixes, teams can ensure they’re working with the highest-quality data, leading to the most effective analytics.

    Author: Devin Partida

    Source: Dataconomy

  • Data integration applied to BI: making data useful for decision making

    Data integration applied to BI: making data useful for decision making

    In this technology-driven world, the influx of data can seem overwhelming, if not properly utilized. With data coming in from so many different sources, the only way to extract real insights from these raw inputs is through integration.

    Properly integrated data has a trickle-down effect on all business processes, such as sales, vendor acquisition, customer management, business intelligence, etc. Implementing this level of integration enables businesses to make continuous improvements to their products and services.

    Business intelligence (BI) is one of the most significant data integration use cases. An effective BI process incorporates everything from predictive analytics to reporting and operations management. But this sort of comprehensive analytics framework requires integrated enterprise data to identify process inefficiencies, missed opportunities, and other improvement areas.

    What complicates BI integration?

    Given that enterprise information comes from different sources in varying formats and often contains inconsistencies, duplicates, and errors, users must ensure that quality issues identified during the data extraction process do not propagate to their end results. These unchecked outputs impact the integrity and accuracy of reporting, which in turn negatively influences decision making leading to further inefficiencies across business processes.

    Creating well-defined integration processes that not only consolidate data but standardize it for consistency and quality can make high-quality data readily available for decision making.

    Streamlining BI integration: best practices

    Raw data becomes valuable when transformed into analytics-ready, actionable information. By bringing disparate formats together into a unified data repository, an integrated BI system offers better visibility and efficiency into the enterprise assets.

    Therefore, successful BI initiatives are a combination of an effective integration and analytics strategy. The best practices stated below can help you make the best of it:

    Document a BI strategy

    Every business has a reporting process in place. Before implementing a new BI strategy, it’s important to evaluate existing systems to identify the areas that need improvement. Based on that information, you can design a new strategy, which can include several components depending on your specific business structure. However, the major ones that cannot be ignored include the following:

    • Narrow down the data source channels essential for your reporting process. This may consist of stakeholder or departmental information from databases, files, or web sources.
    • The purpose of BI tools is essential to track business KPIs with supporting data. Therefore, identifying the custom KPIs for your organization is imperative in presenting a broad picture of your business growth and losses.
    • Set a format for reporting: visual or textual. Based on your preferences and the input sources, you can select a vendor for the BI system.

    Set up data integration tools

    The integration stage of the entire process will be time-consuming. You can go about it in two ways:

    • Opt for the manual approach, where you rely on your developers and IT team to develop a BI architecture for your custom requirements.
    • The simpler and faster approach would be to buy an enterprise-ready integration solution from the market. These solutions extract data from different sources using built-in connectors, transform it into the required format, and load into the destination system that is connected to BI tools. Several data integration solutions offer out-of-the-box connectivity to BI tools. Therefore, purchasing a data integration solution would serve the dual purpose of integration and reporting.

    Factor in data security

    Setting up security measures before implementing BI is imperative in protecting your information assets against data breaches. By configuring authorization or authentication protocols and outlining procedures to carry out secure data processes, you can control access to data sets.

    BI is no longer a privilege for enterprises, it’s a necessity that enables organizations to stay ahead of the competition and optimize decision-making.

    Identifying the challenges in their reporting journey and implementing the best practices mentioned above will help organizations leverage the BI capabilities and become data-focused.

    Author: Ibrahim Surani

    Source: Dataversity

  • Enabling Data Stewardship to Improve Data Quality and Management at your Organization

    Enabling Data Stewardship to Improve Data Quality and Management at your Organization

    As we continue to do business in a digitally connected world, more data-driven organizations are prioritizing data stewardship to improve data quality and management. Data stewards maintain and protect data assets that need special care, not just for cybersecurity but for better business insights and more informed decision-making.

    Understand Data Stewardship Roles and Responsibilities 

    In his presentation at the Data Governance & Information Quality Conference, Jimm Johnson, the Data Governance manager at HireRight, discussed key data stewardship best practices he’s turned to during his 25-plus years of experience in multiple industries and areas of IT, including Data Governance “long before Data Governance became an actual thing.”

    At its core, data stewardship involves “taking care of data on behalf of someone” and being held formally accountable for it, said Johnson. In his organization, he prefers straightforward titles for different types of stewards: “Analytics stewards” focus on business intelligence reports and dashboards, “application stewards” work within IT systems, and “data stewards” take a broader enterprise-level approach to data management. Each plays a key role in an organization’s Data Governance program.

    Regardless of which titles you choose, be sure to define in detail what your data stewards do:

    “You can assign any titles you want to your stewards,” said Johnson. “If you want to come up with a theme – a Star Wars theme, or Disney, or whatever – that’s fine, that might engender interest, but just be very, very clear about their responsibilities and the processes you want them to follow.”

    Exploring data stewardship as a new type of business function, Johnson highlighted four labels you can use to help everyone in the organization understand steward roles and responsibilities:

    • Knowledge keepers: Data stewards serve as subject matter experts, maintaining and sharing insider “tribal” knowledge of institutional data processes. They help to represent teams and business units in collaborative workflows and may also coach or train others.
    • Friendly gatekeepers: Data stewards should know a lot about the rules and standards governing data maintenance. They may research how to match departmental needs to enterprise standards or how to classify and protect different data assets.
    • Quality inspectors: Data stewards should apply these rules and match them to decisions that will keep the company compliant and up to standard. That may involve flagging and remediating problems with data or measuring and improving data quality.
    • Change agents: This is where data stewards will contribute to the process of change that benefits a company or enterprise. When there is a need for new initiatives and evaluations, data stewardship pros can assist others, embrace data literacy, and cultivate the buy-in that’s needed to advance projects to an active stage.

    Identify Important Traits and Skills of Data Stewards 

    Business leaders must understand what makes data stewards successful in order to find the ideal candidates for the role. Johnson outlined some of the characteristics best suited for stewards.

    Coming from both business and IT: Many times, data stewards do best when they have a background in both technology and line-of-business department work. Johnson referred to them as “purple people” – having skills and experience spanning these two different job positions. Data stewards should be multiskilled, as well as “bilingual” and “bicultural” when it comes to the very different worlds of, say, product development and cloud management

    Acting as bridges: Data stewards should be able to translate both simple and complex information and communicate it in written or oral form. Johnson recommended that they also have a good sense of objectivity, distinguishing fact from fiction, and be able to envision what challenges and issues a company might face in the future.

    Excited by data: Thinking globally and participating in an influence culture, data stewards should get immersed in the ideas surrounding good Data Governance and better data handling. “When you’re talking to somebody, and they get really excited about data and their eyes light up, and they’re all energized and stuff, it’s a good sign – they might be fit for a steward role,” Johnson said.

    Data stewards are change agents, Johnson reminded the audience, which ultimately benefits the employers who rely on them to develop best practices for data policies and processes.

    “Data stewards want to embrace change and be part of that change disruption in your organization. If you keep going status quo, you are more than likely not going to reach the outcomes you want. So, you’ve got to change something, and your steward is going to be part of that change process.”

    Help Data Stewards Achieve Success

    Once you’ve found capable data stewards within your organization, you must actively position them for success. “Create a super-transparent list of as many data problems as you’re working on – the issues, the questions, etc.,” recommended Johnson. Next, ensure your data stewards have access to tools that not only provide organization for frameworks but also display their value to stakeholders. Organizations can support data stewards by taking the following measures:

    • Fostering awareness of data challenges: Stewards can use a data quality tracker to sort, assess, categorize, and triage different types of tasks or requirements and then share the results with stakeholders.
    • Classifying data with sensitivity labels: Labeling data confidential or public can help data stewards assess the data assets and work with them in the ways mentioned above. 
    • Cultivating regulatory transparency: First, the company should list applicable state, federal, and international regulatory regimens, such as for California’s Consumer Privacy Act, the federal HIPAA standard, and the European Union’s GDPR. Then, data stewards can help the business make compliance transparent with data reporting tools. 
    • Showcasing program value: Using labels like people, processes, data, and technology, data stewards can form reports that show the value of actions and drive buy-in when it’s needed the most.

    Most importantly, foster a sense of community that brings data stewards together, celebrates their successes, and documents their stories to acknowledge their accomplishments and establish their credibility within the enterprise.

    “Share data steward successes at your council meetings – maybe do videos and once a year release them through internal teams,” suggested Johnson. “Give data stewards the kudos that they deserve and make that very public facing within your company, so that people are aware of all that work they’re doing.”

    Building the connective tissue between people and departments will help achieve a supportive corporate culture, allowing data stewards to properly manage data assets and ensure they are secure, trustworthy, and put to good use within the organization.

    Author: Justin Stoltzfus

    Source: Dataversity

  • How Data Platform Modernization Leads to Enhanced Data Insights  

    How Data Platform Modernization Leads to Enhanced Data Insights

    Today, business enterprises are operating in a highly competitive landscape with multiple touchpoints, channels, and operating and regulatory environments. For such business enterprises, data has become their most important asset, which is being continuously acquired from all types of sources. These may include IoT networks, social media, websites, customers, employees, the cloud, and many more. Here, data is no longer defined only as highly structured information. They constitute a wide variety of data types and structures emanating from a multitude of sources. With all this high volume of information, the question arises: does the data deliver true value to the enterprise? If enterprises cannot extract timely data insights from business data, then they are not adding any value.

    The challenge before businesses today is to leverage data in alignment with technology, security, and governance into a cohesive modernization framework to deliver tangible benefits. Although using data from multiple sources to pursue new business opportunities, streamline operations, predict customer behavior, identify risks, attract customers, and others have become critical, it is only half the battle. The other half involves the need for businesses to update their legacy infrastructure and create a robust IT infrastructure, including large data repositories. For instance, they may seek to develop solutions for on-premise, public, and private clouds by incorporating AI.

    To modernize their existing data platforms and gain better data insights, businesses ought to move legacy data to the cloud while making it available in a streamlined and structured way without risking privacy and security. Besides, businesses do not want to be dependent on vendors for technologies and incur recurring costs. They need technologies that are fast and versatile enough to adapt to their needs. This is where a data modernization platform can prove to be a pathway to optimizing the storage, security, processing, and analysis of data.

    Data has indeed become the lifeblood of businesses across industries and geographies. From sales and marketing to operations and resource management, every aspect of a business relies on data acquisition, processing, and analysis for better decision-making. However, with the vast amount of data being generated every day from various channels, platforms, and touchpoints, it’s becoming increasingly challenging for businesses to keep up. This is where data modernization comes in. Let us understand the five benefits of modernizing a data platform for better data insights.

    5 Benefits of Modernizing Data Platforms to Generate Better Data Insights

    To remain one step ahead of the competition, businesses need to draw better insights from their data in real-time by modernizing their data platforms. The five benefits of doing the same are as follows:

    1. Improved Data Quality

    Modernizing the data platform involves leveraging the latest technologies to upgrade data storage, data processing, and data management systems. This, in addition to enhancing the speed and efficiency of data processing, also improves the quality of the data. Thus, with improved data quality, businesses can make more accurate decisions and gain better insights into their operations.

    2. Increased Data Accessibility

    Not being able to access the right type of data in the right quantity when needed has been a bane for businesses. However, a modernized data platform facilitates data accessibility in real-time. Thus, team members can access the data they need at the time and place of their choosing. This, however, can only be possible through the use of cloud-based platforms. These data insights platforms allow remote data access, enabling teams to collaborate and share data in real time. With increased data accessibility, businesses can promote a more data-driven culture, leading to better decision-making at all levels.

    3. Real-time Data Insights

    With a modernized data platform, businesses can gain real-time insights into their operations, allowing them to make informed decisions quickly. This is particularly useful in industries where timing is critical, such as finance and healthcare. Real-time data insights can also help them identify trends and patterns in the data that might have gone unnoticed otherwise, enabling them to make proactive decisions rather than reactive ones.

    4. Scalability and Flexibility

    Scalability and flexibility are the twin requirements that businesses often need to address as they grow. With a modern data platform, they can achieve both, besides optimizing their data acquisition, processing, and storage needs. In other words, they can scale up or down their data infrastructure without worrying about losing data or facing downtime. A flexible data platform also enables the seamless integration of new data sources or technologies, allowing businesses to stay ahead of the competition.

    5. Cost Savings

    In the final analysis, modernizing the data platform can offer significant cost savings. For instance, by optimizing data storage, processing, and management systems, businesses or data modernization services, can reduce the amount of time and resources spent on data processing and analysis. This can lead to more efficient operations and reduced costs. Additionally, cloud-based platforms can offer cost savings by reducing the need for setting up and maintaining on-premises infrastructure.

    Conclusion

    With data becoming the most important asset for businesses operating in the digital landscape, it needs to be leveraged using data platforms to gain big data insights and make informed decisions. However, modernizing the data platforms is essential to optimizing activities related to data acquisition, storage, processing, and analysis. Unless businesses can extract the right kind of data in real-time, they will not be able to draw the right insights or inferences on market trends, customer preferences, and other tangibles. So, among the benefits that modernized data platforms are likely to offer are improved quality of data, better access to data, real-time data insights, scalability and flexibility, and cost savings. By investing in modernizing data platforms, businesses can stay ahead of the competition and drive growth.

    Date: June 6, 2023

    Author: Hermanth Kumar

    Source: Datafloq

     

     

  • How Machine Learning Can Improve Data Quality

    How Machine Learning Helps to Improve Data Quality

    Machine learning makes improving Data Quality easier. Data Quality refers to the accuracy of the data: High-quality data is more accurate, while low-quality data is less accurate. Accurate data/information supports good decision-making. Inaccurate data/information results in bad decision-making. 

    So, intelligent decision-making can be supported by supplying accurate information through the use of machine learning. 

    Machine learning (ML) is a subdivision of artificial intelligence (AI). However, during the late 1970s to early ’80s, AI researchers lost much of their research funding – by way of exaggerated and broken promises. The small machine learning community that had developed had the option of going out of business, or adapting machine learning to accomplish small, specific tasks for the business world. They chose the second option. 

    While the term “artificial intelligence” is often used in promoting machine learning, machine learning can also be treated as a separate industry.

    A variety of individual, successful machine learning algorithms have been used to perform several different tasks. These tasks can be broken down into three basic functions: descriptive, predictive, and prescriptive. A descriptive machine learning algorithm is used to explain what happened. A predictive ML algorithm uses data to forecast what will happen. A prescriptive ML algorithm will use data to suggest what actions should be taken.

    Automation vs. Machine Learning

    The automation used for modern computer systems can be described as a form of software that follows pre-programmed rules. It means that machines are replicating the behavior of humans to accomplish a task. For instance, invoices can be sent out using an automated process, producing them in minutes and eliminating human error. 

    Automation is the use of technology to perform tasks historically performed by humans. 

    Aside from being a component of artificial intelligence, machine learning can also be considered an evolutionary step in automation. At a very basic level, machine learning can be treated as a form of automation that can learn from its mistakes and adjust its responses to new situations. 

    The ML software is exposed to sets of data and draws certain conclusions from that data. It then applies those conclusions to similar situations. 

    How Machine Learning Works

    Machine learning uses algorithms. At its most basic level, an algorithm is a series of step-by-step instructions, similar to a baking recipe. The recipe is called a “procedure,” and the ingredients are called “inputs.” Machine learning algorithms have instructions that allow for alternative responses, while using previous experiences to select the most probable appropriate response. 

    A large number of machine learning algorithms are available for a variety of circumstances.  

    Machine learning starts training with data – text, photos, or numbers – such as business records, pictures of baked goods, data from manufacturing sensors, or repair records. Data is collected and prepared for use as training data. And the more training data, the better the resulting program.

    After selecting and collecting the training data, programmers select an appropriate ML model, provide the data, and then allow the machine learning model to train itself to find patterns in the data and make predictions. As time passes, a human programmer can tweak the model, changing its parameters to help achieve more accurate results. 

    Some data is deliberately withheld from the training process and is used later in testing and evaluating the accuracy of the ML training program. This training and testing process produces a machine learning model that can be used for specific tasks requiring flexible responses. 

    While machine learning can be remarkably useful, it is not perfect, and when it makes a mistake, it can be quite surprising. 

    Applying Machine Learning to Data Quality

    Machine learning algorithms can detect anomalies and suggest ways to improve error detection. Generally speaking, this is ideal for improving Data Quality. Listed below are some examples of the tasks machine learning algorithms perform to improve Data Quality:

    • Reconciliation: The process of comparing data from trusted sources to ensure the completeness and accuracy of migrating data. By examining user actions and historical data about how reconciliation issues were resolved previously, machine learning algorithms can use these examples for learning and, by using fuzzy logic, make the reconciliation process more efficient.
    • Missing data: ML regression models are used primarily in predictive analytics to predict trends and forecast outcomes, but can also be used to improve Data Quality by estimating the missing data within an organization’s system. ML models can identify missing records and assess missing data. These models constantly improve their accuracy as they work with more data. 
    • Data Quality rules: Machine learning can translate unstructured data into a usable format. Machine learning can examine incoming data and automatically generate rules that can proactively communicate quality concerns about that data in real time. Manual or automated rules work for known issues, however, the unknowns in data are rising with the increasing complexity of data. With more data, the ML algorithms can predict and detect the unknowns more accurately.
    • Filling in data gaps: Machine learning algorithms can fill in the small amounts of missing data when there is a relationship between the data and other recorded features, or when there is historical information available. ML can correct missing data issues by predicting the values needed to replace those missing values. Feedback from humans can, over time, help the algorithms learn the probable corrections.
    • In-house data cleansing: Manual data entry often includes incomplete addresses, incorrect spellings, etc. Machine learning algorithms can correct many common errors (which spellcheck would not correct, because this involves names and addresses) and help in standardizing the data. ML algorithms can learn to continuously use reference data to improve the data’s accuracy. (If there is no reference data, it’s possible to use recorded links to the data for backtracking purposes.)
    • Improving regulatory reporting: During regulatory reporting, incorrect records may accidentally be turned over to the regulators. Machine learning algorithms can identify and remove these records before they are sent. 
    • Creating business rules: Machine learning algorithms – such as decision tree algorithms – can use an existing business rules engine and information taken from the data warehouse to create new business rules, or improve existing business rules.

    The Risks of Poor-Quality Data

    The use of poor-quality data can damage a business and result in unnecessary expenses. Decisions based on inaccurate data can result in severe consequences. Fortunately, machine learning algorithms can catch some of these issues before they cause damage. For example, financial institutions can use machine learning to identify forged transactions. 

    Many businesses are already using machine learning as a part of their evolving Data Management strategy. The availability of off-the-shelf ML software has made access to machine learning much easier.

    Date: July 4, 2023

    Author: Keith D. Foote

    Source: Dataversity

     

  • Machine learning, AI, and the increasing attention for data quality

    Machine learning, AI, and the increasing attention for data quality

    Data quality has been going through a renaissance recently.

    As a growing number of organizations increase efforts to transition computing infrastructure to the cloud and invest in cutting-edge machine learning and AI initiatives, they are finding that the main barrier to success is the quality of their data.

    The old saying “garbage in, garbage out” has never been more relevant. With the speed and scale of today’s analytics workloads and the businesses that they support, the costs associated with poor data quality are also higher than ever.

    This is reflected in a massive uptick in media coverage on the topic. Over the past few months, data quality has been the focus of feature articles in The Wall Street Journal, Forbes, Harvard Business Review, MIT Sloan Management Review and others. The common theme is that the success of machine learning and AI is completely dependent on data quality. A quote that summarizes this dependency very well is this one by Thomas Redman: ''If your data is bad, your machine learning tools are useless.''

    The development of new approaches towards data quality

    The need to accelerate data quality assessment, remediation and monitoring has never been more critical for organizations and they are finding that the traditional approaches to data quality don’t provide the speed, scale and agility required by today’s businesses.

    For this reason, highly rated data preparation business Trifacta recently announced an expansion into data quality and unveiled two major new platform capabilities with active profiling and smart cleaning. This is the first time Trifacta has expanded our focus beyond data preparation. By adding new data quality functionality, the business aims to gain capabilities to handle a wider set of data management tasks as part of a modern DataOps platform.

    Legacy approaches to data quality involve many manual, disparate activities as part of a broader process. Dedicated data quality teams, often disconnected from the business context of the data they are working with, manage the process of profiling, fixing and continually monitoring data quality in operational workflows. Each step must be managed in a completely separate interface. It’s hard to iteratively move back-and-forth between steps such as profiling and remediation. Worst of all, the individuals doing the work of managing data quality often don’t have the appropriate context for the data to make informed decisions when business rules change or new situations arise.

    Trifacta uses interactive visualizations and machine intelligence guides help users by highlighting data quality issues and providing intelligent suggestions on how to address them. Profiling, user interaction, intelligent suggestions, and guided decision-making are all interconnected and drive the other. Users can seamlessly transition back-and-forth between steps to ensure their work is correct. This guided approach lowers the barriers to users and helps to democratize the work beyond siloed data quality teams, allowing those with the business context to own and deliver quality outputs with greater efficiency to downstream analytics initiatives.

    New data platform capabilities like this are only a first (albeit significant) step into data quality. Keep your eyes open and expect more developments towards data quality in the near future!

    Author: Will Davis

    Source: Trifacta

  • Managing data at your organization? Take a holistic approach

    Managing data at your organization? Take a holistic approach

    Taking a holistic approach to data requires considering the entire data lifecycle – from gathering, integrating, and organizing data to analyzing and maintaining it. Companies must create a standard for their data that fits their business needs and processes. To determine what those are, start by asking your internal stakeholders questions such as, “Who needs access to the data?” and “What do each of these departments, teams, or leaders need to know? And why?” This helps establish what data is necessary, what can be purged from the system, and how the remaining data should be organized and presented.

    This holistic approach helps yield higher-quality data that’s more usable and more actionable. Here are three reasons to take a holistic approach at your organization:

    1. Remote workforce needs simpler systems

    We saw a massive shift to work-from-home in 2020, and that trend continues to pick up speed. Companies like Twitter, Shopify, Siemens, and the State Bank of India are telling employees they can continue working remotely indefinitely. And according to the World Economic Forum, the number of people working remotely worldwide is expected to double in 2021.

    This makes it vital that we simplify how people interact with their business systems, including CRMs. After all, we still need answers to everyday questions like, “Who’s handling the XYZ account now?” and “How did customer service solve ABC’s problem?” But instead of being able to ask the person in the next office or cubicle, we’re forced to rely on a CRM to keep us up to date and make sure we’re moving in the right direction.

    This means team members must input data in a timely manner, and others must be able to access that data easily and make sense of it, whether it’s to view the sales pipeline, analyze a marketing campaign’s performance, or spot changes in customer buying behavior.

    Unfortunately, the CRMs used by many companies make data entry and analytics challenging. At best, this is an efficiency issue. At worst, it means people aren’t inputting the data that’s needed, and any analysis of spotty data will be flawed. That’s why we suggest companies focus on improving their CRM’s user interface, if it isn’t already user-friendly.

    2. A greater need for data accuracy

    The increased reliance on CRM data also means companies need to ramp up their Data Quality efforts. People need access to clean, accurate information they can act on quickly.

    It’s a profound waste of time when the sales team needs to verify contact information for every lead before they reach out, or when data scientists have to spend hours each week cleaning up data before they analyze it.

    Yet, according to online learning company O’Reilly’s The State of Data Quality 2020 report, 40% or more of companies suffer from these and other major Data Quality issues:

    • Poor quality controls when data enters the system
    • Too many data sources and inconsistent data
    • Poorly labeled data
    • Disorganized data
    • Too few resources to address Data Quality issues

    These are serious systemic issues that must be addressed in order to deliver accurate data on an ongoing basis.

    3. A greater need for automation

    Data Quality Management is an ongoing process throughout the entire data lifecycle. We can’t just clean up data once and call it done.

    Unfortunately, many companies are being forced to work with smaller budgets and leaner teams these days, yet the same amount of data cleanup and maintenance work needs to get done. Automation can help with many of the repetitive tasks involved in data cleanup and maintenance. This includes:

    • Standardizing data
    • Removing duplicates
    • Preventing new duplicates
    • Managing imports
    • Importing/exporting data
    • Converting leads
    • Verifying data

    A solid business case

    By taking a holistic approach to Data Management – including simplifying business systems, improving data accuracy, and automating whenever possible – companies can improve the efficiency and effectiveness of teams throughout their organization. These efforts will help organizations come through the pandemic stronger, with a “new normal” for data that’s far better than what came before.

    Author: Oilivia Hinkle

    Source: Dataversity

  • Migros: an example of seizing the opportunities BI offers

    Migros: an example of seizing the opportunities BI offers

    Migros is the largest retailer in Turkey, with more than 2500 outlets selling fresh produce and groceries to millions of people. To maintain high-quality operations, the company depends on fresh, accurate data. And to ensure high data quality, Migros depends on Talend.

    The sheer volume of data managed by Migros is astonishing. The company’s data warehouse currently holds more than 200 terabytes, and Migros is running more than 7,000 ETL (extract, transform, load) jobs every day. Recently, the quality of that data became the focal point for the BI (business intelligence) team at Migros.

    “We have 4,000 BI users in this company,” said Ahmet Gozmen, Senior Manager of IT Data Quality and Governance at Migros. “We produce 5-6 million mobile reports every year that our BI analysts see on their personal dashboards. If they can’t trust the timeliness or the accuracy of the reports, they can’t provide trustworthy guidance on key business decisions.”

    In 2019, Mr. Gozmen and his team decided they needed a more reliable foundation on which to build data quality. “We were having a few issues with our data at that time,” he said. “There would be occasional problematic or unexpected values in reports—a store’s stock would indicate an abnormal level, for example—and the issue was in the data, not the inventory. We had to address these problems, and more than that we wanted to take our data analysis and BI capabilities to a higher level.''

    From Community to Commercial

    Initially, Mr. Gozmen’s team used the non-commercial version of Talend Data Quality. “It was an open-source solution that we could download and set up in one day,” he said. “At first, we just wanted to see whether we could do something with this tool or not. We explored its capabilities, and we asked the Talend Community if we had questions or needed advice.”

    Mr. Gozmen discovered that Talend had far more potential than he expected. “We found that the data quality tool was very powerful, and we started exploring what else we could do with Talend,” he said. “So we also downloaded the data integration package, then the big data package. Talend could handle the huge volumes of data we were dealing with. And very soon we started thinking about the licensed, commercial versions of these solutions, because we saw a great deal of potential not only for immediate needs but for future plans.”

    By upgrading to the commercial versions, Migros also elevated the level of service and support that was available. “The Community served our purposes well in the early stages,” said Mr. Gozmen, “but with the commercial license we now have more personalized support and access to specialists who can help us immediately with any aspect of our implementation.”

    From Better Data Quality to Big Data Dreams

    With Talend Data Quality, Migros has improved the accuracy and reliability of its BI reporting, according to Mr. Gozmen. “We are a small department in a very big company,” he said, “but with help from Talend we can instill confidence in our reporting, and we can start to support other departments and have a larger impact on improving processes and even help generate more income.”

    The higher level of data quality Migros has achieved with Talend has also led Mr. Gozmen to consider using Talend for future data initiatives. “We have big dreams, and we are testing the waters on several fronts,” he said. “We are exploring the possibilities for predictive analytics, and we feel Talend’s big data capabilities are a good match.”

    The Migros team is also considering using Talend in moving from its current batch processing mode to real-time data analysis, according to Mr. Gozmen. “We are currently using date-minus-one or date-minus two batch processing, but we want to move to real-time big data predictive analytics and other advanced capabilities as soon as possible,” he said. “We are currently testing new models that can help us achieve these goals.”

    While the business advantages of using Talend are manifesting themselves in past, present, and future use cases, Mr. Gozmen sums them up this way: “With Talend we can trust our data, so business leaders can trust our reports, so we can start to use big data in new ways to improve processes, supply chain performance, and business results.”

    Author: Laura Ventura

    Source: Talend

  • Overcoming Hurdles in Test Data Management

    Overcoming Hurdles in Test Data Management

    A robust testing process is vital to bringing quality software, product, or application to the market. A smooth testing process is fueled by the right quality test data, in the right volume, and the correct format. Thus, testing assurance entirely depends on the quality of test data.

    A test data management solution helps to manage the quality of data and ensures the desired availability of test data throughout the software development lifecycle. It helps the developers and testing teams to be more productive. While the importance of Test Data Management (TDM) is well established in determining testing completeness and coverage, implementing a smooth TDM process isn’t easy.

    Challenges faced while managing test data

    Lack of TDM Standardization and Requirement

    Business goals keep evolving to cater to the end-user needs. Hence, it is instrumental in documenting clear test data requirements with the quantity specifications for determining the testing quality. The testing team requires data in different types of formats to carry out different kinds of testing. Thus, within an enterprise, all teams must be fully aware of the TDM process in place and set a standard data request format to avoid scenarios of non-availability of appropriate test data. The standardization in TDM helps to reduce the length of the testing cycles.

    Poor Quality Data and Consistency

    With more heterogeneous systems involved in an ecosystem, data is found in many different forms and formats, and it can also be dispersed across multiple systems. This is a challenge for many enterprises as they struggle to fetch meaningful data that they can use for testing. A streamlined process and consistent approach to refreshing data are necessary to check the poor data quality and integrity issues.

    Data loses its relevance and reliability as it ages. Thus, data needs continuous validation and maintenance of data integrity. The data cycle should be traced from the beginning to improve the data integrity and troubleshoot issues. Enterprises often fail to use a smaller subset of data that mimics the production environment to achieve the best test coverage. Data sets related to a particular defect missed during testing can lead to a significant risk during production.

    Compromised Data Privacy

    Data can be in the form of sensitive information related to customers or others. Data masking of sensitive information must comply with government security standards and mandates. Breach or leak of data can result in the malicious use of that data and further cause financial damage to enterprises. From the beginning, it is advisable to incorporate encryption and masking steps for sensitive information (transactional data) in the TDM process. With test data encryption, adherence to geo-specific compliances should also be incorporated into the process.

    Lack of expertise

    The TDM process holds the potential to streamline the testing process but also requires expertise. Often within an enterprise, teams involved in testing cannot identify the appropriate test data management approach required for a particular project. A centralized or dedicated team running the TDM process can be harmful as it hampers continuous data integration. For instance, the TDM team, independent of teams and agile sprints, can increase data volume and the length of data provisioning cycles. Teams working in silos also increase the chances of missing out on external factors, including device location or internet connection, during testing. For instance, there can be a loss of data due to hardware issues which may impact the product quality and can directly impact customers’ lives. Thus, teams’ lack of test data management knowledge can exacerbate the problem.

    Adopting TDM platforms is a proficient way to tackle the discussed challenges. They can handle large heaps of data for enterprises, and it can generate reports. Such platforms help teams identify and create data subsets for testing and automate TDM activities such as data masking, data generation, cloning etc., making the process more efficient.

    Popular Test Data Management Tools in the Market

    Broadcom Test Data Manager

    Broadcom TDM platform can quickly create the most miniature test data set necessary to run effective tests. One can also generate test data for multiple scenarios and anticipate and develop data for future scenarios to highlight unexpected results and identify how products react to various conditions. The key features are Generate Data Better and Show Data Better to help store data centrally, clone data (during provisioning) and reuse the existing data to deliver test data whenever and wherever needed.

    K2View Test Data Management

    K2View TDM platform is one of the top players in data management. Based on user-defined rules, their test data management tool quickly creates and provisions test data subsets from various production sources. Testing teams can directly leverage the self-service portal or APIs to define their desired test data sets. K2View’s solution helps extract all the data associated with the business entities (customers, in our example) from the relevant production systems and synthesizes missing data as required. Their test data solution provides other teams and testers with provision parameter-based subsets. It also protects sensitive data before it is delivered to the appropriate test environments.

    Gartner’s “Voice of the Customer” report, published in June 2021, highlights how K2View ensures the high-end security the tool promises for sensitive information. K2View’s key feature automates the data provisioning process on a single platform concerning technologies. It provides dashboards to monitor data requests, track execution status, execution results, and more. It delivers complete test data you can trust and complies with privacy regulations.

    Informatica Test Data Management

    Informatica TDM platform is cost-effective by consistently automating sensitive information’s data masking process in and across databases. The platform provides compliance at scale with data masking and subsetting capabilities for testing. It also offers services for analyzing or monitoring risks and maintaining compliance with data governance initiatives. Informatica offers valuable test data management resources, including Informatica Cloud Test Data Management, Persistent Data Masking and a data privacy framework with solutions such as Data Privacy Management.

    Conclusion

    Efficient data integration and using correct data at the right time is the key to functional testing and the correct way to determine a company’s success. It helps to precisely emulate your users’ workflow and interaction with your app. TDM process reduces time consumption by automating the process in any project. Without automation, it will cost a lot and will take more time. TDM also comes with a set of challenges, and planning the TDM requirement and platform can help an enterprise increase the robustness and transparency of the process.

    Author: Yash Mehta

    Source: Datafloq

     
  • Six Common Challenges when Adopting a Data-Driven Approach

    Six Common Challenges when Adopting a Data-Driven Approach

    Companies that embrace data-driven approaches stand to perform much better than those that don’t, yet they’re still in the minority. What’s standing in the way?

    It’s no surprise that becoming a data-driven company is at the top of the corporate agenda. A recent IDC whitepaper found that data-savvy companies reported a threefold increase in revenue improvement, almost tripling the likelihood of reduced time to market for new products and services, and more than doubling the probability of enhanced customer satisfaction, profits, and operational efficiency.

    But according to a January survey of data and information executives from NewVantage Partners, merely a quarter of companies describe themselves as data-driven, and only 21% say they have a data culture in their organizations.

    Several key factors help explain this disconnect, but cultural issues were cited by 80% of respondents as the biggest factor keeping them from getting value from their data investments, while only 20% pointed to technology limitations. Based on the experience of experts who have surmounted these roadblocks firsthand, others remain as well.

    Recognizing bad data

    Even the best of analytics strategies can be derailed if the underlying data is bad. But solving data quality problems requires a deep understanding of what the data means and how it’s collected. Resolving duplicate data is one issue, but when the data is just wrong, that’s much harder to fix, says Uyi Stewart, chief data and technology officer at Data.org, a nonprofit backed by the Mastercard Center for Inclusive Growth and the Rockefeller Foundation.

    “The challenge of veracity is much more difficult and takes more time,” he says. “This is where you require domain expertise to allow you to separate fact from fiction.”

    Simple technical skills are not enough. That’s what Lenno Maris found out when he joined FrieslandCampina, a multinational dairy cooperative, in 2017, when the company was embarking on a strategic plan to become a data-driven company.It was a big challenge. The company has over 21,000 employees in 31 countries, and has customers in over 100 countries. It quickly became clear that data quality was going to be a big hurdle.

    For example, inventory was reported based on the number of pallets, but orders were based on unit numbers, says Maris, the company’s senior global director for enterprise data and authorizations. This meant that people had to do manual conversions to ensure the right quantities were delivered at the right price. 

    Or take commodity codes. Each plant put in the commodity code that best fit the product, with different plants using different codes that were then used to reclaim import and export taxes. “But tax reporting is performed at the corporate level, so consistency is needed,” says Maris.

    To fix the data issues, FrieslandCampina had to evolve its data organization. At the start of the project, the team focused mostly on the technical details of data entry. But that changed quickly. “We’ve been able to retrain our team to become process experts, data quality experts, and domain experts,” Maris says. “That allows us to transition to proactive data support and become advisors to our business peers.”

    Similarly, the technology platform chosen to help the company improve its data quality, Syniti, had to adapt as well. “The platform is good but highly technical,” Maris says. “So we had some challenges with our business user adoption. We’ve challenged Syniti to provide a business-relevant user interface.”

    In 2018, the tier-one master data objects were in place: vendors, materials, customers, and finance. The following year, this expanded to tier-two data objects, including contracts, bills of materials, rebates, and pricing. By the end of 2022, the company had finished orchestrating the logical business flows and the project was fully deployed. The result was a 95% improvement in data quality and a 108% improvement in productivity.

    “Prior to implementation of the foundational data platform, we had over 10,000 hours of rework on our master data on an annual basis,” he says. “Today, this has been reduced to almost zero.”

    Data quality was also an issue at Aflac, says Aflac CIO Shelia Anderson. When Aflac began its journey toward becoming a data-driven company, there were different business operations across Aflac’s various books of business, she says.

    “There were multiple systems of data intake, which presented inconsistencies in data quality,” she says. That made it difficult to get useful insights from the data. To solve the problem, Aflac moved to a digital-first, customer-centric approach. This required data consolidation across various ecosystems, and as a result, the customer experience has improved and the company has been able to increase automation in its business processes and reduce error rates. “A significant benefit is that it frees bandwidth for customer service agents, enabling them to focus on higher complexity claims that require a more personal touch,” she says.

    Seeing data consolidation as a technology problem

    One of Randy Sykes’ previous employers spent eight years building a data warehouse without success. “That’s because we tried to apply standard system development techniques without making sure that the business was with you in lockstep,” he says. Today, Sykes is IT director of data services at Hastings Mutual Insurance Co. This time, he took a different approach to consolidating the organization’s data.

    Ten years ago, the company decided to bring everything together into a data warehouse. At the time, reports took 45 days to produce and business users didn’t have the information they needed to make business decisions.

    First, data would be collected in a landing area via nightly batch imports from legacy systems. It would then move into a staging area, where business rules would be applied to consolidate and reconcile data from different systems. This required a deep understanding of how the company operates and what the data means. But this time, the project was successful because there were subject matter experts on the team. “We had a couple of business folks who’d been with the company a long time and had a lot of knowledge of the organization,” he says. “You actually have a cross-functional team to be successful.”

    For example, different insurance policy systems might have different terms, and different coverage areas and risks. In order to consolidate all this information, the data team needs to have a good understanding of the business language and the rules needed to transform the raw data into a universal format. “That’s the biggest challenge that companies run into,” he says. “They try to get the data and technically put it together and forget the business story behind the information. A lot of times, these types of projects fail.”

    Today, a report that used to take 45 days can be turned around in 24 hours, he says. Then, as databases continue to get modernized and become event-driven, the information will become available in real time.

    No short-term business benefits

    Once Hastings started getting data together, the data project began producing value for the company, within a year, even though the data warehouse project, which began in 2014, wasn’t delivered until 2017. That’s because the landing and staging areas were already providing value in terms of gathering and processing the data. Data projects have to deliver business value all throughout the process, Sykes says. “Nobody is going to wait forever.”

    A similar “quick win” helped lead to the success of a major data project for Denise Allec, principal consultant at NTT Americas, back when she was the director of corporate IT at a major corporation. A six-week proof-of-concept project showed that the project had value, she says, and helped overcome challenges such as business units’ unwillingness to give up their silos of data. “Giving up ownership of data represents a loss of control to many,” she says. “Information is power.”

    This kind of data hoarding isn’t limited to senior executives, though. “Employees tend not to trust others’ data,” she says. They want to validate and scrub their own sources, and massage and create their own reporting tools that work for their unique needs. “We’ve all seen the numerous duplicative databases that exist throughout a company and the challenges that arise from such a situation,” she says.

    Choosing data projects that don’t have immediate benefits is a major roadblock to successful data initiatives, confirms Sanjay Srivastava, chief digital strategist at Genpact.  “Until you do this, it’s all a theoretical discussion.”

    The flip side is choosing projects that don’t have any ability to scale—another major barrier. Without the ability to scale, a data project won’t have meaningful long-term impact, instead using up resources for a small or idiosyncratic use case.

    “The key is how you deliver business value in chunks, in a time frame that keeps people’s attention, and that is scalable,” he says.

    Not giving end users the self-service tools they need

    Putting the business users first means giving people the data they need in the form they need it. Sometimes, that means Excel spreadsheets. At Hastings, for example, staff would historically copy-and-paste data into Excel in order to work with it. “Everybody uses Excel,” says Hastings’ Sykes. “Now we say, ‘Why don’t we just give you the data so you don’t have to copy-and-paste it anymore.’”

    But the company has also been creating dashboards. Today, about a quarter of the company’s 420 employees are using the dashboards as well as outside agencies. “They can now help agents cross-sell our products,” he says. “We didn’t have that before.”

    But providing people with the serf-serve analytics tools they need is a challenge. “We’re still behind the eight ball a little bit,” he says. But with 200 business-focused dashboards already in place, the process is well under way.

    Another organization that recently began the process of democratizing access to data is the Dayton Children’s Hospital in Dayton, Ohio. “We weren’t doing that well five years ago,” says CIO J.D. Whitlock. “There were still a lot of spreadsheets. Now we’re using the Microsoft data stack, like a lot of people are doing. So as long as someone knows a little bit about how to use PowerBI, we’re serving up the appropriate data, in the appropriate format, with appropriate security.”

    In addition, data analysts have also been decentralized, so people don’t have to go to a single team with their data questions. “Say you want to know how many of procedures X doctor Y did last year,” says Whitlock. “It’s a relatively simple query. But if you don’t give people the tools to do that themselves, then you’ve got a thousand requests.” Putting self-serve data tools in place has helped the company move toward being a data-driven organization, he says. “With the caveat that it’s always a journey and you never declare victory.”

    Not including end users in your development process

    Ignoring user needs is nearly always a recipe for disaster. For example, Nick Kramer recently worked with a national restaurant services company. Kramer is the leader of applied solutions at SSA & Company, a global consulting firm. The restaurant services company was growing rapidly but service levels were dropping. “Everybody was pointing fingers at each other,” he says. “But the CIO had no dashboards or reports—just anecdotes and opinions.”

    One of the problems was that the central installation system was widely ignored. Employees updated records, but after the fact. The system had been imposed on them and was hard to use. “People in the order department, in sales, legal, and on the installation side—every office had their own spreadsheets they ran their schedules on,” Kramer says. “None of the communication was happening and the data wasn’t flowing. So you had to go office by office to find out who was doing what and how well, and which delays were unsolvable and which ones could be addressed.”

    The solution was to get close to the business users, to understand how the data was used. Joshua Swartz, partner at Kearney, had a similar experience recently when he was working on a consulting project with a US food company with several billion in annual revenues.

    The company wanted to enable production managers to make better decisions about what to produce based on real data. “For example, there’s a production line in a certain production site and it can make either tortilla chips or pita bread,” says Swartz. “If there’s a switchover, you have to stop and clean and change the ingredients.”

    But, say, the old way was to do four hours on tortillas and four hours on pita bread, and the data showed that you should do two hours on tortilla chips—and then tomorrow it may be the opposite. And since food products are perishable, getting production wrong means that some product would have to be thrown away. But when the company first designed its solution, the production workers weren’t involved, says Swartz. “They were too busy producing food and didn’t have time to stop and attend meetings.”

    This wasn’t expected to be a problem because the company’s culture was hierarchical. “When the CEO says something and pounds their fist on the table, everyone has to follow suit,” he says. But the new system was used for only a couple of weeks in the pilot site and then the employees found that the system didn’t really work for them and went back to doing things the old way. Also, it didn’t help that the company’s data czar was located a couple of layers down in the company’s technology organization, rather than closer to top management or to the business units.

    Fixing the problem required bringing the actual employees to the design suite, even though it required adding capacity to the production lines to free up workers. “Food companies with very thin margins weren’t comfortable making that investment,” Swartz says. But when they became part of the process, they were able to contribute to the solution, and today a third to a half of the facilities are using the new technology.

    Swartz also recommends that the chief data officer be located closer to the company’s most valuable data. “If data is a strategic asset of the business, I would place the CDO closer to the part of the business that has ownership of the data,” he says. “If the organization is focused on using data for operational efficiency, then under the COO might be the right place.”

    A sales-driven company might want to put the CDO under the sales officer, however, and a product company, under the marketing officer, he says. One consumer packaged goods company he worked with actually had the CDO report directly to the CEO.

    “If you think of data as a technology problem, you’re going to keep running into challenges of how much value you are actually getting from data and analytics,” says Swartz.

    A lack of trust

    The responsible use of data is important for the success of data initiatives, and nowhere more so than in finance. “Trust is of utmost importance in the banking sector,” says Sameer Gupta, chief analytics officer at DBS Bank. “It’s crucial to use data and models responsibly, and ethical considerations must be upheld while using data.” Data use should be purposeful, he says, respectful, and explainable, and should never come as a surprise. “Data use should be expected by individuals and corporates,” he says.

    By focusing on trust, he adds, the bank has been able to deploy AI and data use cases across the enterprise—260 at the last count—ranging from customer-facing businesses like consumer and small and medium enterprise banking, to support functions like compliance, marketing, and HR.“In 2022, the revenue uplift from our AI and machine learning initiatives was about SGD 150 million [US $112 million], more than double that from the previous year,” he says. “We aspire to achieve SGD 1 billion in the next five years.”

    Earning trust takes time and commitment. Becoming a data-driven company is all but impossible without it. But once trust is gained, it begins a virtuous cycle. According to a CapGemini change management study released in January, in organizations with strong data analytics, employees are 18% more likely to trust the company. And when those companies need to evolve further, the probability of successful change is 23 to 27% higher than at other organizations.

    “Many people, including data experts, think most issues while transitioning toward becoming a data-driven company are technology-related,” says Eugenio Zuccarelli, a data scientist at a global retailer and former AI research scientist at MIT. But the real barriers are personal, he says, as people have to learn to understand the value of making data-based decisions.

    “While doing research at MIT, I often saw experts and leaders of organizations struggle with their transition toward becoming a more data-driven organization,” he says. “The main issues were usually cultural, such as a belief that technology would have overtaken their decision-making, rather than empowering them, and a general tendency to take decisions based on experience and gut feelings.” People need to understand that their expertise is still vital, he adds, and that the data is there to provide additional input. 

    Companies need to stop thinking about becoming a data-driven company as a technology problem. “All our clients are talking about becoming more data driven, and none of them know what it means,” says Donncha Carroll, partner in the revenue growth practice and head of the data science team at Lotis Blue Consulting. They focus on their technology capabilities, he says, not what people will be able to do with the data they get.

    “They don’t put the user of the solution in the frame,” he says. “Lots of data analytics teams provide data dashboards that provide information that is neither useful nor actionable. And it dies on the vine.”

    Author: Maria Korolov

    Source: CIO

    Date: May 25, 2023

  • The Growing Influence of Ethical AI in Data Science

    The Growing Influence of Ethical AI in Data Science

    Industries such as insurance that handle personal information are paying more attention to customers’ desire for responsible, transparent AI.

    AI (artificial intelligence) is a tremendous asset to companies that use predictive modeling and have automated tasks. However, AI is still facing problems with data bias. After all, AI gets its marching orders from human-generated data -- which by its nature is prone to bias, no matter how evolved we humans like to think we are.

    With the wide adoption of AI, many industries are starting to pay attention to a new form of governance called responsible or ethical AI. These are governance practices associated with regulated data. For most organizations, this involves removing any unintentional bias or discrimination from their customer data and cross-checking any unexpected algorithmic activity once the data moves into production mode.

    This is an especially important transformation for the insurance industry because consumers today are becoming far more attuned to their personal end-to-end experience in any industry that relies on the use of personal data. By advancing responsible, ethical AI, insurers can confidently map to the way consumers want to search for insurance and find insurance policies, and they can align with the values and ethics that govern this kind of personal search.

    What Does Inherent Bias Look Like in AI Algorithms Today?

    One of the more noticeable examples of human-learned, albeit unintentional, data bias today is around gender. This happens when the AI system does not behave the same way for a man versus a woman, even when the data provided to the system is identical except for the gender information. One example outcome is that individuals who should be in the same insurance risk category are offered unequal policy advice.

    Another example is something called the survivor bias, which is optimizing an AI model using only available, visible data -- i.e., “surviving” data. This approach inadvertently overlooks information due to the lack of visibility, and the results are skewed to one vantage point. To move past this weakness, for example in the insurance industry, AI must be trained not to favor the known customer data over prospective customer data that is not yet known.

    More enterprises are becoming aware of how these data determinants can expose them to unnecessary risk. A case in point: in their State of AI in 2021 report, McKinsey reviewed industry regulatory compliance through the filter of a company’s allegiance to equity and fairness data practices --and reported that two of companies’ top three global concerns are the ability to establish ethical AI and to explain their practices well to customers.

    How Can Companies Proactively Eliminate Data Bias Company-wide?

    Most companies should already have a diversity, equity, and inclusion (DEI) program to set a strong foundation before exploring practices in technology, processes, and people. At a minimum, companies can set a goal to remove ingrained data biases. Fortunately, there are a host of best-practice options to do this.

    • Adopt an open source strategy. First, enterprises need to know that biases are not necessarily where they imagine them to be. There can be a bias in the sales training data or in the data at the later inference or prediction time, or both. At Zelros, for example, we recommend that companies use an open source strategy to be more open and transparent in their AI initiatives. This is becoming an essential baseline anti-bias step that is being practiced at companies of all sizes. 

    • Utilize vendor partnerships. Companies that want to put a bigger stake in the ground when it comes to regulatory compliance and ethical AI standards can collaborate with organizations such as isahit, dedicated to helping organizations across industries become competent in their use and implementation of ethical AI. As a best practice, we recommend that companies work toward adopting responsible AI at every level, not just with their technical R&D or research teams, then communicate this governance proliferation to their customers and partners. 

    • Initiate bias bounties. Another method for eliminating data bias was identified by Forrester as a significant trend in their North American “Predictions 2022” guide. It is an initiative called bias bounties. Forrester stated that, “At least five large companies will introduce bias bounties in 2022.”
      Bias bounties are like bug bounties, but instead of rewarding users based on the issues they detect in software, users are rewarded for identifying bias in AI systems. The bias happens because of incomplete data or existing data that can lead to discriminatory outcomes from AI systems. According to Forrester, in 2022, major tech companies such as Google and Microsoft will implement bias bounties, and so will non-technology organizations such as banks and healthcare companies. With trust high on stakeholders’ agenda, basing decisions on accountability and integrity is more critical than ever.

    • Get certified. Finally, another method for establishing an ethical AI approach -- one that is gaining momentum -- is getting AI system certification. Being able to provide proof of the built-in governance through an external audit goes a long way. In Europe, the AI Act is a resource for institutions to assess their AI systems from a process or operational standpoint. In the U.S., the NAIC is a reference organization providing guiding principles for insurers to follow. Another option is for companies to align to a third-party organization for best practices.

    Can an AI System Be Self-criticizing and Self-sustaining?

    Creating an AI system that is both self-criticizing and self-sustaining is the goal. Through the design itself, the AI must adapt and learn, with the support of human common sense, which the machine cannot emulate.

    Companies that want to have a fair prediction outcome may analyze different metrics at various subgroup levels within a specific model feature (for example gender) because that can help identify and prevent biases before they go to market with consumer-facing capabilities. With any AI, making sure that it doesn’t fall into a trap called a Simpson’s Paradox is key. Simpson's Paradox, which also goes by several other names, is a phenomenon in probability and statistics where a trend appears in several groups of data but disappears or reverses when the groups are combined. Successfully preventing this from happening ensures that personal data does not penalize the client or consumer who it is supposed to benefit.

    Responsible Use of AI Can Be a Powerful Advantage

    Companies are starting to pay attention to how responsible AI has the power to nurture a virtuous, profitable circle of customer retention through more reliable and robust data collection. There will be challenges in the ongoing refinement of ethical AI for many applications, but the strategic advantages and opportunities are clear. In insurance, the ability to monitor, control, and balance human bias can keep policy recommendations meant for certain races and genders fairly focused on the needs of those intended audiences. Responsible AI leads to stronger customer attraction and retention, and ultimately increased profitability.

    Conclusion

    Companies globally are revving up their focus on data equity and fairness as a relevant risk to mitigate. Fortunately, they have options to choose from to protect themselves. AI offers an opportunity to accelerate more diverse, equitable interactions between humans and machines. Solutions can help large enterprises globally provide hyper-personalized, unbiased recommendations across channels. Respected trend analysts have called out data bias a top business concern of 2022. Simultaneously, they identify responsible, ethical AI as a forward-thinking solution companies can deploy to increase customer and partner trust and boost profitability.

    How are you moving toward an ethical use of AI today?

    Author: Damien Philippon

    Source: TDWI

  • The key challenges in translating high quality data to value

    The key challenges in translating high quality data to value

    Most organizations consider their data quality to be either 'good' or 'very good', but there’s a disconnect around understanding and trust in the data and how it informs business decisions, according to new research from software company Syncsort.

    The company surveyed 175 data management professionals earlier this year, and found that 38% rated their data quality as good while 27% said it was very good.

    A majority of the respondents (69%) said their leadership trusts data insights enough to inform business decisions. Yet they also said only 14% of stakeholders had a very good understanding of the data. Of the 27% who reported sub-optimal data quality, 72% said it negatively affected business decisions.

    The top three challenges companies face when ensuring high quality data are multiple sources of data (70%), applying data governance processes (50%) and volume of data (48%).

    Approximately three quarters (78%) have challenges profiling or applying data quality to large data sets, and 29% said they have a partial understanding of the data that exists across their organization. About half (48%) said they have a good understanding.

    Fewer than 50% of the respondents said they take advantage of data profiling tools or data catalogs. Instead, they rely on other methods to gain an understanding of data. More than half use SQL queries and about 40% use business intelligence tools.

    Author: Bob Violino

    Source: Information-management

  • The state of BI adoption: usage, drivers and recommendations

    The state of BI adoption: usage, drivers and recommendations

    Although adoption rates for BI/analytics tools remain stuck in the 20% range, usage is increasing. Usage growth is primarily fueled by “off-license” usage from front-line workers using BI/analytics output embedded in operational applications as well as external users (e.g., customers and suppliers) using external-facing reports and dashboards. These new usage trends are most prevalent among leading adopters of data & analytics (e.g., best-in-class companies) as well as North American companies, which are traditionally more aggressive in adopting new technologies and approaches than their European counterparts.

    In addition, new self-service tools, such as GUI-based authoring and data preparation tools, are making it easier for businesspeople to service their own data needs without IT assistance. Also, data catalogs make it easier for these business users to discover useful data, and new ad hoc query capabilities – namely BI search and augmented analytics – are starting to propel higher levels of BI/analytics adoption and usage.

    At the same time, organizations are applying 30 years of hard-won knowledge about how to overcome barriers to adoption and usage. Specifically, organizations are implementing data governance programs and data quality workflows to improve data accuracy, completeness, and consistency. They are launching data literacy programs with coaching and support networks to improve knowledge and skills required to use BI/analytics tools effectively. Most importantly, executives are becoming more data-driven, providing leadership, funding, and personal examples to foster a robust culture of data and analytics usage.

    Global survey

    To investigate BI/analytics adoption, BARC and Eckerson Group conducted a global survey of 214 data & analytics leaders in November and December of 2021, drawing respondents from organizations around the globe of all sizes and in many different industries. More than a third (36%) had more than 5,000 employees, 29% had between 500 and 4,999 employees, and 36% had less than 500 employees. More than two-thirds of respondents (70%) were from Europe, while 19% were from North America and the rest from South America, Asia Pacific and Africa.

    More than half of respondents (51%) were executives, VP/directors, or managers. The following is a list of the percentage of respondents by role in descending order: manager of BI, Analytics, ML/AI, or Data Management (30%); VP/Director of BI, analytics, ML/AI, or data management (13%); architect of BI, analytics, ML/AI, or data management (13%); consultant or vendor on behalf of a current client (13%); analyst of BI, analytics, or ML/AI (12%); executive (CXO) (8%); engineer of data, analytics, ML/AI (8%); and other (5%).

    Key takeaways

      • Adoption
        The percentage of employees actively using BI/analytics tools is currently 25% on average, reflecting minimal growth in the past seven years weʼve been tracking this metric.
      • Usage
        At the same time, 50% of data & analytics leaders say BI/analytics usage has “increased a lot.”
      • Technical drivers
        The primary technical drivers of increased usage are “self-service authoring tools” (73%), data preparation tools (48%), and “embedded BI/analytics” (38%).
      • Business drivers
        The primary business drivers of increased usage are “change in data culture” (51%), “new data-driven executives” (50%), “digital transformation or other strategic initiatives” (50%).
      • Regional and other variations
        These drivers are more prominent among North American companies and leading adopters of data & analytics (best-in-class companies) by a significant margin.
      • Barriers
        The primary barriers to adoption and usage are “lack of proper training” (50%), “lack of quality data” (41%), “budget issues” (36%), and “ease of use” (33%).
      • Adoption killers
        There are certain things that almost instantaneously kill BI/analytics adoption and usage:
        1. the data needed is not available or accessible
        2. the data isnʼt trustworthy
        3. the tools arenʼt flexible or easy to use
        4. query performance is slow, and
        5. there arenʼt enough people to coach or support business users
    • Adoption drivers
      On the other hand, BI/analytics usage is bolstered by:
      1. data-driven executives
      2. comprehensive training and support programs
      3. tailored self-service tooling
      4. embedded analytics
      5. comprehensive data governance
      6. analytics centers of excellence, and
      7. agile delivery of high-value solutions.

    Ten recommendations

    Consider these 10 recommendations for improving adoption, usage, and value of BI/analytics tools and creating a successful data & analytics program:

    • Tailor self-service
      Know your users and deliver what they need, even if you have to build it centrally. For 60% of business users, tailored parameterized dashboards are the epitome of self-service.
    • Govern self-service
      Self-service BI/analytics implemented without governance or knowledge of user requirements will strangle a data & analytics program.
    • Power users first
      Focus on meeting the needs of power users first to develop useful data models and structures that other users can leverage. But donʼt let power users dictate choice of tools, reports or dashboards provided to regular business users.
    • Tear down the data silos
      Data that is available but not accessible or a general lack of data are major reasons for lack of tool use. Understand the need for data and consider how it can be captured or how to overcome the organizational barriers of closed data silos.
    • Data quality at all costs
      Move mountains to deliver data that users trust. Certify reports, implement data governance, build data quality rules and workflows, report on data quality, and partner closely with source system owners to improve data entry and systems notifications.
    • Embed analytics
      Turn operational workers into just-in-time analysts by embedding charts, tables and dashboards into ERP/CRM applications, portals and other run-the-business applications.
    • Look externally
      Service users in your organizationʼs ecosystem. For example, you can improve customer loyalty by providing them with data and insights about their activity with your company. These data products for customers, suppliers and others can help improve the bottom line too.
    • Create an analytics center of excellence
      Whether you centralize or embed data analysts, teach these individuals enterprise standards using data and how to communicate with business managers. Align analysts with business units and rotate them periodically.
    • Go beyond training
      Training is critical, but coaching and support create a culture of analytics. Build peer communities for both power and casual users to spread knowledge and excitement about how to use data to achieve business goals. Improving data and analytics competence should not be the sole purview of power users, but lift the data literacy of everyone in the organization.
    • Work top down and bottom up
      Find or cultivate data-driven executives who lead by word and example. At the same time, organize departmental managers who feel the pain of poor quality data and insights into an Analytics Council that sets standards and pushes for change.

    Source: BARC

  • Understanding and taking advantage of smart data distancing

    Understanding and taking advantage of smart data distancing

    The ongoing COVID-19 pandemic has made the term 'social distancing' a cynosure of our daily conversations. There have been guidelines issued, media campaigns run on prime time, hashtags created, and memes shared to highlight how social distancing can save lives. When you have young children talking about it, you know the message has cut across the cacophony! This might give data scientists a clue of what they can do to garner enterprise attention towards the importance of better data management.

    While many enterprises kickstart their data management projects with much fanfare, egregious data quality practices can hamper the effectiveness of these projects, leading to disastrous results. In a 2016 research study, IBM estimated that bad quality data costs the U.S. economy around $3.1 trillion dollars every year.

    And bad quality data affects the entire ecosystem; salespeople chase the wrong prospects, marketing campaigns do not reach the target segment, and delivery teams are busy cleaning up flawed projects. The good news is that it doesn’t have to be this way. The solution is 'smart data distancing'.

    What is smart data distancing?

    Smart data distancing is a crucial aspect of data danagement, more specifically, data governance for businesses to identify, create, maintain, and authenticate data assets to ensure they are devoid of data corruption or mishandling.

    The recent pandemic has forced governments and health experts to issue explicit guidelines on basic health etiquette; washing hands, using hand sanitizer, keeping social distance, etc. At times, even the most rudimentary facts need to be recapped multiple times so that they become accepted practices.

    Enterprises, too, should strongly emphasize the need for their data assets to be accountable, accurate, and consistent to reap the true benefits of data governance.

    The 7 do’s and don’ts of smart data distancing:

    1. Establish clear guidelines based on global best data management practices for the internal or external data lifecycle process. When accompanied by a good metadata management solution, which includes data profiling, classification, management, and organizing diverse enterprise data, this can vastly improve target marketing campaigns, customer service, and even new product development.

    2. Set up quarantine units for regular data cleansing or data scrubbing, matching, and standardization for all inbound and outbound data.

    3. Build centralized data asset management to optimize, refresh, and overcome data duplication issues for overall accuracy and consistency of data quality.

    4. Create data integrity standards using stringent constraint and trigger techniques. These techniques will impose restrictions against accidental damage to your data.

    5. Create periodic training programs for all data stakeholders on the right practices to gather and handle data assets and the need to maintain data accuracy and consistency. A data-driven culture will ensure the who, what, when, and where of your organization’s data and help bring transparency in complex processes.

    6. Don’t focus only on existing data that is readily available but also focus on the process of creating or capturing new and useful data. Responsive businesses create a successful data-driven culture that encompasses people, process, as well as technology.

    7. Don’t take your customer for granted. Always choose ethical data partners.

    How to navigate your way around third-party data

    The COVID-19 crisis has clearly highlighted how prevention is better than a cure. To this effect, the need to maintain safe and minimal human contact has been stressed immensely. Applying the same logic when enterprises rely on third-party data, the risks also increase manifold. Enterprises cannot ensure that a third-party data partner/vendor follows proper data quality processes and procedures.

    The questions that should keep your lights on at night are:

    • Will my third-party data partner disclose their data assessment and audit processes?
    • What are the risks involved, and how can they be best assessed, addressed, mitigated, and monitored?
    • Does my data partner have an adequate security response plan in case of a data breach?
    • Will a vendor agreement suffice in protecting my business interests?
    • Can an enterprise hold a third-party vendor accountable for data quality and data integrity lapses?  

    Smart data distancing for managing third-party data

    The third-party data risk landscape is complex. If the third-party’s data integrity is compromised, your organization stands to lose vital business data. However, here are a few steps you can take to protect your business:

    • Create a thorough information-sharing policy for protection against data leakage.
    • Streamline data dictionaries and metadata repositories to formulate a single cohesive data management policy that furthers the organization’s objectives.
    • Maintain quality of enterprise metadata to ensure its consistency across all organizational units to increase its trust value.
    • Integrate the linkage between business goals and the enterprise information running across the organization with the help of a robust metadata management system.
    • Schedule periodic training programs that emphasize the value of data integrity and its role in decision-making.

    The functional importance of a data steward in the data management and governance framework is often overlooked. The hallmark of a good data governance framework lies in how well the role of the data steward has been etched and fashioned within an organization. The data steward (or a custodian) determines the fitness levels of your data elements, the establishment of control, and the evaluation of vulnerabilities, and they remain on the frontline in managing any data breach. As a conduit between the IT and end-users, a data steward offers you a transparent overview of an organization’s critical data assets that can help you have nuanced conversations with your customers. 

    Unlock the benefits of smart data distancing

    Smart and unadulterated data is instrumental to the success of data governance. However, many enterprises often are content to just meet the bare minimum standards of compliance and regulation and tend to overlook the priority it deserves. Smart data means cleaner, high-quality data, which in turn means sharper analytics that directly translates to better decisions for better outcomes.

    Gartner says corporate data is valued at 20-25% of the enterprise value. Organizations should learn to monetize and use it wisely. Organizations can reap the benefits of the historical and current data that has been amassed over the years by harnessing and linking them to new business initiatives and projects. Data governance based on smart enterprise data will offer you the strategic competence to gain a competitive edge and improve operational efficiency.

    Conclusion 

    It is an accepted fact that an enterprise with poor data management will suffer an impact on its bottom line. Not having a properly defined data management framework can create regulatory compliance issues and impact business revenue.

    Enterprises are beginning to see the value of data in driving better outcomes and hence are rushing their efforts in setting up robust data governance initiatives. There are a lot of technology solutions and platforms available. Towards this endeavor, the first step for an enterprise is to develop a mindset of being data-driven and being receptive to a transformative culture.

     he objective is to ensure that the enterprise data serves the cross-functional business initiatives with insightful information, and for that to happen, the data needs to be accurate, meaningful, and trustworthy. Setting out to be a successful data-driven enterprise can be a daunting objective with a long transformational journey. Take a step in the right direction today with smart data distancing!

    Author: Sowmya Kandregula

    Source: Dataversity

  • What makes your data healthy data?

    What makes your data healthy data?

    If someone asked you what makes data “healthy”, what would you say? What IS data health? Healthy data just means data that is quality, accessible, trusted, and secure, right? Wrong. 

    • Healthy data is data that provides business value. 
    • Data health depends on how well an organization's data supports its business objectives. 
    • Your data is unhealthy if it does not provide business value. 

    Let's dissect. Data health really has nothing to do with the data itself, if you think about it. It has everything to do with the state of your organization as a whole - whether you’re a university, a government entity, or a commercial business - and how well your data supports your current and long-term business objectives.  

    It’s so easy to think data quality = data health. Think instead: what is the biggest problem we have in the world of data today?   

    It’s not moving data, connecting to data sources, or moving from on-premise to the cloud. It’s not even data quality, integration, or management! In today’s world with SO many solutions to choose from, we have access to more tools than ever that let us connect, store, and move our data. 

    You probably already spend an awful lot of time and money getting your data loaded, managed, movable, and usable. But the question remains: are you getting any REAL value out of that data you spend so much time and money on?   

    The biggest problem businesses face is not getting value from their data 

    According to our 2021 Data Health survey, 64% of executives surveyed work with data every day while 44% of finance executives make the majority of their decisions without data. If you're in the majority, you're already a step ahead - but just working with data isn't enough. It has to be about delivering an outcome.

    Given, that outcome can look different depending on who you talk to in your organization.

    Data health can mean different things to different roles 

    If you speak to the CEO vs the CMO vs the VP of Sales vs the Head of Compliance or IT, data health is going to mean something different to each of them. This is because every one of these business leaders has a different data health problem – but there's a common thread. 

    They're all not achieving their objectives because their data isn't enabling them to. To create a data health strategy with real business value, you have to start from the bottom: what are you trying to achieve?   

    What’s your most important business objective?  

    The world of managing dataalone does not deliver value. Focusing on value first: what are you trying to do? Often, companies have business objectives such as creating an intuitive marketing strategy, improving sales, or meeting regulatory compliance.  

    Once you outline your objectives, recognizing that they may be different objectives depending on role, now you can move to: 

    What data supports that objective? 

    Say you have a massive amount of data in a CRM, with intent data coming from multiple different systems. You want to bring that marketing data together to find your target audiences and tap into their needs, right?  

    Or maybe your marketing efforts for intent data are inefficient because your data is siloed and not being used to deliver the insights you want, as fast as you want them. 

    Consider what data you would need to achieve your business objectives, and then finally: 

    What’s stopping you from achieving that today? 

    You understand your goals, you know what you need to get there, you are relying on your data to deliver business outcome to be healthy – but that’s not enough. You need the platform and the technology to be able to do it.  

    You need a platform that combines the concepts of data quality, trust, and accessibility, that can also get you focused on achieving business initiatives – not just managing and moving around your data. In a data-driven world with endless options, you need a solution entirely focused making your business outcomes a reality with (truly) healthy data.

    Author: Stu Garrow

    Source: Talend

EasyTagCloud v2.8