data science integrity

The persuasive power of data and the importance of data integrity

Data is like statistics: a matter of interpretation. The process may look scientific, but that does not mean the result is credible or reliable.

  • How can we trust what a person says if we deny the legitimacy of what he believes?
  • How can we know a theory is right if its rationale is wrong?
  • How can we prove an assertion is sound if its basis is not only unsound but unjust?

To ask questions like these is to remember that data is neutral, it is an abstraction, whose application is more vulnerable to nefarious ends than noble deeds; that human nature is replete with examples of discrimination, tribalism, bias, and groupthink; that it is not unnatural for confirmation bias to prevail at the expense of logic; that all humanity is subject to instances of pride, envy, fear, and illogic.

What we should fear is not data, but ourselves. We should fear the misuse of data to damn a person or ruin a group of people. We should fear our failure to heed Richard Feynman’s first principle about not fooling ourselves. We should fear, in short, the corruption of data; the contemptible abuse of data by all manner of people, who give pseudoscience the veneer of respectability.

Nowhere is the possibility of abuse more destructive, nowhere is the potential for abuse more deadly, nowhere is the possible, deliberate misreading of data more probable than in our judicial system.

I write these words from experience, as both a scientist by training and an expert witness by way of my testimony in civil trials.

What I know is this: Data has the power to persuade.

People who use data, namely lawyers, have the power to persuade; they have the power to enter data into the record, arguing that what is on the record, that what a stenographer records in a transcript, that what jurors read from the record is dispositive.

According to Wayne R. Cohen, a professor at The George Washington University School of Law and a Washington, DC injury claims attorney, data depends on context.

Which is to say data is the product of the way people gather, interpret, and apply it.

Unless a witness volunteers information, or divulges it during cross-examination, a jury may not know what that witness’s data excludes: exculpatory evidence, acts of omission, that reveals the accused is not guilty, that the case against the accused lacks sufficient proof, that the case sows doubt instead of stamping it out.

That scenario should compel us to be more scrupulous about data.

That scenario should compel us to check (and double-check) data, not because we should refuse to accept data, but because we must not accept what we refuse to check.

That scenario summons us to learn more about data, so we may not have to risk everything, so we may not have to jeopardize our judgment, by speculating about what may be in lieu of what is.

That scenario is why we must be vigilant about the integrity of data, making it unimpeachable and unassailable.

May that scenario influence our actions.

Author: Michael Shaw

Source: Dataversity