censored data analytics

Why caution is recommended when using analytics for censorship

Historically, concerns about over-zealous censorship have focused on repressive governments. In the United States (and many other countries), free speech has been a pillar of society since its founding. For the most part, government attempts at censorship or speech restrictions receive swift and successful push back. In recent times, however, a new path to censorship has arisen in the form of search engine and social media companies that are building analytically-based censorship algorithms.

These organizations are using analytics to censor speech more aggressively than any past governmental effort and are somehow convincing a sizable portion of the population that it is a good thing. This post will outline why the use of analytics for centralized censorship is a steep and slippery slope and also lay out an alternative that will enable those same censorship analytics to provide people with a choice rather than a dictate.

Where is the line?

Let’s assume, for the sake of argument, that we all agreed that censorship is ethical and desired (of course, we don’t all agree on that, but just assume we do). Under those terms, we still have to agree on exactly where to draw the line that delineates what should be censored from what should not. Reaching such an agreement would be as impossible as deciding to censor in the first place. But, for the sake of argument, let’s assume we could all magically agree on the exact same lines in the sand. Does that mean we’re ready to be effective at implementing our censorship plan? No!

Even after agreeing that we should censor information and agreeing on what to censor, we still have to build the analytical processes to flag the 'bad' content. As we all know, no algorithm will be perfect. So, do we error on censoring too much 'legitimate' content to ensure we filter out all the 'illegitimate' content? Or do we make sure we allow all 'legitimate' content, even though that will also let some 'illegitimate' content sneak past? Once again, we’ll find it almost impossible to reach agreement.

No matter what analytics we agree to, the models will still make errors. Our censorship will never perfectly match our intentions, even if we agreed to those intentions. Inherently, therefore, using algorithms to censor information will lead to disparities between intent and outcome. Is this an effective or ethical use of analytics?

The reality today

The concern today is that we have data science teams making up their own rules about what to censor and forcing us to accept it. The people drawing the lines in the sand are not representative of the general population and the people building the models won’t be any more successful than anyone else at effectively targeting the arbitrary lines drawn. This is a dangerous situation where unelected, anonymous people are deciding what information we see and who can speak.

This isn’t just an ideological issue as some would suggest. Sure, some people will agree or disagree more with the current censorship being applied. But just remember that, even if you are comfortable with the decisions being made today because they fall in line with your world view, totally different decisions might be made tomorrow when someone else is in charge. Once you accept the right of these organizations to censor, the tables can be turned on you at some point, even if today that is not the case.

Just think of the sticky situations we’ll get into based on the standards of today. If I post an April Fool’s article, do I risk being banned for spreading fake news? At what point is my view simply unpopular or contrarian and at what point is it 'dangerous and illegitimate' and worthy of being censored, along with me also being completely banished? These are not decisions to be made lightly.

An alternative option to centralized censorship

Personally, I don’t believe in censorship. However, some people do. Why not give us all a choice to view information as we prefer? The same algorithms being built to censor information by force can be made available as options we can turn on or off, much like we do with privacysettings. Let’s allow individuals to make the choice with regards to what they read, watch, or hear and what they don’t.

There can be various filters aimed at hate speech that differ based on how the user chooses to define hate speech and how strict the user desires the filter to be. There can also be filters that knock out any political content of any type, for instance, if we just want a break from politics. When we want to catch up on politics, we can always turn the filter off. We can also have positive filters that elevate a topic we’re interested in. Perhaps a big sports event is upcoming and so I turn on the filter that requests more content than usual on the event. Or the other way around if I'm not into sports.

Analytics can be used to filter any type of information in or out. We can make those analytics available for people to choose from instead of having faceless workers in Silicon Valley forcing their choices and their models on us all.

If we aren’t careful, we’ll soon slip into an Orwellian world of extreme censorship and suppression of information. Of note is that the greatest risk today isn’t from the government, but from private corporations who control the flow of information in today’s world. This is one example where analytics are being used in ways that could lead to disaster if we don’t have a broader conversation as a society about how we should proceed.

As outlined above, I’d love to see individuals enabled to make our own choices. Give us the ability to censor (or not) as we each see fit. There is no reason that the analytics of censorship can’t be steered in this direction of choice and away from the current dictatorial trajectory.

Source: Datafloq