3 items tagged "data integrity"

  • Staying on the right track in the era of big data

    Staying on the right track in the era of big data

    Volume dominates the multidimensional big data world. The challenge many organizations today are facing is harnessing the potential of the data and applying all of the usual methods and technologies at scale. After all, data growth is only increasing and is currently being produced at 2.5 quintillion bytes of data per day. Unfortunately, a large portion of this data is unstructured, making it even harder to categorize.

    Compounding the problem, most businesses expect that decisions made based on data will be more effective and successful in the long run. However, with big data often comes big noise. After all, the more information you have, the more chance that some of that information might be incorrect, duplicated, outdated, or otherwise flawed. This is a challenge that most data analysts are prepared for, but one that IT teams need to consider and factor into their downstream processing and decision making to ensure that any bad data does not skew the resulting insights.

    This is why overarching big data analytics solutions alone are not enough to ensure data integrity in the era of big data. In addition, while new technologies like AI and machine learning can help make sense of the data en masse, often these rely on a certain amount of cleaning and condensing going on behind the scenes to be effective and able to run at scale. While accounting for some errors in the data is fine, being able to find and eliminate mistakes where possible is a valuable capability, which can have a catastrophic effect in terms of derailing effective analysis and delaying the time to value. In particular if there is a configuration error or problem with a single data source creating a stream of bad data. Without the right tools, these kinds of errors can create unexpected results and leave data professionals with an unwieldy mass of data to sort through to try and find the culprit.

    This problem is compounded when data is ingested from multiple different sources and systems, each of which may have treated the data in a different way. The sheer complexity of big data architecture can turn the challenge from finding a single needle in a haystack to one more akin to finding a single needle in a whole barn.

    Meanwhile, this problem has become one that doesn’t just affect the IT function and business decision making, but is becoming a legal requirement to overcome. Legislation like the European Union’s General Data Protection Regulation (GDPR) mandates that businesses find a way to manage and track all of their personal data, no matter how complicated the infrastructure or unstructured information. In addition, upon receiving a valid request, organizations need to be able to delete information pertaining to an individual or collect and share it as part of an individual’s right to data portability.

    So, what’s the solution? One of the best solutions for managing the beast of big data overall is also one that builds in a way to ensure data integrity, ensuring a full data lineage by automating data ingestion. This creates a clear path showing how data has been used over time, as well as its origins. In addition, this process is done automatically, making it much easier and more reliable. However, it is important to ensure that lineage is done at the fine detail level.

    With the right data lineage tools, ensuring data integrity in a big data environment becomes far easier. The right tracking means that data scientists can track data back through the process to explain what data was used, from where, and why. Meanwhile, businesses can track down the data of a single individual, sorting through all the noise to fulfill subject access requests without disrupting the big data pipeline as a whole, or diverting significant business resources. As a result, analysis of big data can deliver more insight, and thus more value, faster, despite its multidimensional complexity.

    Author: Neil Barton

    Source: Dataversity

  • The persuasive power of data and the importance of data integrity

    The persuasive power of data and the importance of data integrity

    Data is like statistics: a matter of interpretation. The process may look scientific, but that does not mean the result is credible or reliable.

    • How can we trust what a person says if we deny the legitimacy of what he believes?
    • How can we know a theory is right if its rationale is wrong?
    • How can we prove an assertion is sound if its basis is not only unsound but unjust?

    To ask questions like these is to remember that data is neutral, it is an abstraction, whose application is more vulnerable to nefarious ends than noble deeds; that human nature is replete with examples of discrimination, tribalism, bias, and groupthink; that it is not unnatural for confirmation bias to prevail at the expense of logic; that all humanity is subject to instances of pride, envy, fear, and illogic.

    What we should fear is not data, but ourselves. We should fear the misuse of data to damn a person or ruin a group of people. We should fear our failure to heed Richard Feynman’s first principle about not fooling ourselves. We should fear, in short, the corruption of data; the contemptible abuse of data by all manner of people, who give pseudoscience the veneer of respectability.

    Nowhere is the possibility of abuse more destructive, nowhere is the potential for abuse more deadly, nowhere is the possible, deliberate misreading of data more probable than in our judicial system.

    I write these words from experience, as both a scientist by training and an expert witness by way of my testimony in civil trials.

    What I know is this: Data has the power to persuade.

    People who use data, namely lawyers, have the power to persuade; they have the power to enter data into the record, arguing that what is on the record, that what a stenographer records in a transcript, that what jurors read from the record is dispositive.

    According to Wayne R. Cohen, a professor at The George Washington University School of Law and a Washington, DC injury claims attorney, data depends on context.

    Which is to say data is the product of the way people gather, interpret, and apply it.

    Unless a witness volunteers information, or divulges it during cross-examination, a jury may not know what that witness’s data excludes: exculpatory evidence, acts of omission, that reveals the accused is not guilty, that the case against the accused lacks sufficient proof, that the case sows doubt instead of stamping it out.

    That scenario should compel us to be more scrupulous about data.

    That scenario should compel us to check (and double-check) data, not because we should refuse to accept data, but because we must not accept what we refuse to check.

    That scenario summons us to learn more about data, so we may not have to risk everything, so we may not have to jeopardize our judgment, by speculating about what may be in lieu of what is.

    That scenario is why we must be vigilant about the integrity of data, making it unimpeachable and unassailable.

    May that scenario influence our actions.

    Author: Michael Shaw

    Source: Dataversity

  • Why it is key to teach your organization about data integrity

    Why it is key to teach your organization about data integrity

    Are you prepared for an attack on your data environment? A data integrity drill can determine the readiness of your enterprise to respond and recover.

    A lot has changed in the world of IT over the past decade. We have seen digital services move from being an important aspect of an organization’s operations to being fundamental to its business success. The scalability, flexibility, and other capabilities of cloud services have made these digital services (and the digital economy they have created) possible. We have also witnessed a massive rise in the number of ransomware and other types of cyberattacks -- attacks that exploit the growing value of data in this digital economy.

    These changes have made it more complex and more important than ever for IT to make their data environments resilient. In the past, IT could ensure their data environments were resilient if existing processes and technologies were sufficient to restore the enterprise’s on-premises infrastructure after a cyberattack.

    However, today IT needs to ensure that their sprawling, business-critical, hybrid-cloud data environments -- that now include dozens of SaaS applications and multiple cloud services as well as on-premises infrastructure and employee endpoints -- are protected against these threats as well as a growing number of increasingly sophisticated attacks.

    Practice to Play

    The technologies organizations need to ensure their IT is resilient -- strong perimeter security systems, high-availability cloud services, and robust data backup and recovery solutions -- are available. However, despite intuitive interfaces and automation features, using these technologies can be complicated and takes practice. IT teams that have not practiced using these technologies in response to simulated disasters are likely to find that when a real-world disaster does occur, it takes longer than expected to restore their data environment -- assuming they can restore it at all.

    Today, many organizations still practice for disasters as if all their applications were on site or that a natural disaster was the greatest possible threat to their data environment. Given the changes to their data environments and the types of disasters that threaten them, organizations need to rethink their preparations. One way is to implement regularly scheduled “data integrity drills.”

    During a data integrity drill, an organization simulates how it would use its data security, data protection, and other technologies to restore the integrity of its data environment after a data disaster. To properly implement such drills, organizations need to:

    • Build a data integrity team that includes everyone involved in addressing data disasters
    • Surprise these teams with a variety of data integrity drills
    • Create a culture that values data integrity so both the data integrity team and larger organization understand why they are investing significant time and other resources into these drills

    Data integrity drills enable organizations to confirm they have the skills, processes, and technologies in place to prevent or recover from the data threats or attacks they face today and gain the “muscle memory” they need to efficiently and effectively respond when a data disaster does occur.

    Building Your Data Integrity Team

    When a data disaster hits, your IT team members are not the only people called on to address it.

    For example, if the disaster is a cyberattack, your legal team will need to inform customers quickly if their data has been exposed by the breach or your business risks stiff regulatory fines. Human resources will need to communicate the implications of the disaster to your employees (and possibly your partners). Your IT team’s security and data protection professionals will need support from those on your IT team responsible for SaaS applications, cloud services, on-premises infrastructure, and other aspects of the data environment affected by the disaster to bring that environment back online.

    Before implementing data integrity drills, create a data integrity team that includes the IT, legal, HR, and operations teams as well as any other professionals who are responsible during an actual disaster. At the same time, the responsibilities for each of these consolidated team members need to be specified. In other words, you need to recruit your data integrity team and assign them their positions before you start the practice for the “big game.”

    Surprise Your Data Integrity Team with a Variety of Disasters

    When a real-world data disaster occurs, your data integrity team is not likely to be aware of the timing or nature of the disaster beforehand. Given this, although you might not want to schedule a data integrity drill for an extremely busy day or time for the company (such as the end of a quarter), the timing of the drill should remain a surprise to most of the data integrity team.

    Such drills should also vary so team members can practice responding to different kinds of disasters involving different aspects of their organization’s data environments -- everything from a natural disaster damaging a data center or a ransomware attack to a disgruntled employee destroying files on the way out. By mixing up the types of drills and making them a surprise, the drills will stress the organization’s existing disaster remediation and recovery processes and technologies as they would in a real disaster.

    This “surprise approach” will challenge your team’s skills, sharpening them and revealing where additional skills are needed. Such drills will also reveal if growing data sprawl has created weak spots or other cracks in your organization’s data integrity strategy, where certain applications, infrastructure, or other parts of the data environment are more vulnerable than others.

    Create a Culture That Values Data Integrity

    Your organization is likely to see pushback on the implementation of data integrity drills. Preparing to be on the data integrity team and conducting data integrity drills takes people away from their day-to-day responsibilities and reduces the time they can spend on other strategic projects.

    This is precisely why your enterprise needs to create a culture that sees data integrity as a core strategy, fundamental to the success of its business. This will require communicating to employees that the time they spend preparing for and conducting data integrity drills pales in comparison to the time they are likely to spend remediating a cyberattack or other disaster if they are unprepared.

    Take the Data Integrity Challenge

    Data environments today do not just serve as the nervous system for most companies’ daily operations. These environments also provide the data needed to predict customer behavior, improve operational efficiency, set corporate strategy, and improve business outcomes.

    This is why I would encourage all organizations to challenge themselves by testing their IT resiliency with at least one data integrity drill. Maybe your drill will reveal that you already have in place all the skills, processes, and technologies needed to protect your data crown jewels from any threat. More likely, the drill will expose skills you need, processes that can be improved, and technologies that need to be upgraded -- so you can fix these problems before a real disaster strikes.


    Source: TDWI

EasyTagCloud v2.8