Yes, “algorithms” can be biased. Here’s why

  News
image_pdfimage_print
Seriously, it's enough to make researchers cry.
Enlarge / Seriously, it’s enough to make researchers cry.
Dr. Steve Bellovin is professor of computer science at Columbia University, where he researches “networks, security, and why the two don’t get along.” He is the author of Thinking Security and the co-author of Firewalls and Internet Security: Repelling the Wily Hacker. The opinions expressed in this piece do not necessarily represent those of Ars Technica.

Newly elected Rep. Alexandria Ocasio-Cortez (D-NY) recently stated that facial recognition “algorithms” (and by extension all “algorithms”) “always have these racial inequities that get translated” and that “those algorithms are still pegged to basic human assumptions. They’re just automated assumptions. And if you don’t fix the bias, then you are just automating the bias.”

She was mocked for this claim on the grounds that “algorithms” are “driven by math” and thus can’t be biased—but she’s basically right. Let’s take a look at why.

First, some notes on terminology—and in particular a clarification for why I keep putting scare quotes around the word “algorithm.” As anyone who has ever taken an introductory programming class knows, algorithms are at the heart of computer programming and computer science. (No, those two are not the same, but I won’t go into that today.) In popular discourse, however, the word is widely misused.

Let’s start with the Merriam-Webster dictionary definition, which defines “algorithm” as:

[a] procedure for solving a mathematical problem (as of finding the greatest common divisor) in a finite number of steps that frequently involves repetition of an operation broadly: a step-by-step procedure for solving a problem or accomplishing some end

It is a step-by-step procedure. The word has been used in that sense for a long time, and extends back well before computers. Merriam-Webster says it goes back to 1926, though the Oxford English Dictionary gives this quote from 1811:

It [sc. the calculus of variations] wants a new algorithm, a compendious method by which the theorems may be established without ambiguity and circumlocution.

Today, though, the word “algorithm” often means the mysterious step-by-step procedures used by the biggest tech companies (especially Google, Facebook, and Twitter) to select what we see. These companies do indeed use algorithms—but ones having very special properties. It’s these properties that make such algorithms useful, including for facial recognition. But they are also at the root of the controversy over “bias.”

The algorithms in question are a form of “artificial intelligence” (AI). Again, from Merriam-Webster, AI is:

[a] branch of computer science dealing with the simulation of intelligent behavior in computers.

The technical field under the name “AI” goes back to the mid-1950s, and many AI techniques have been tried in the intervening decades. By far the most successful is known as machine learning (ML), which is “the process by which a computer is able to improve its own performance (as in analyzing image files) by continuously incorporating new data into an existing statistical model.” And one of the most popular ML techniques today is the use of neural networks, which are “a computer architecture in which a number of processors are interconnected in a manner suggestive of the connections between neurons in a human brain and which is able to learn by a process of trial and error.” (I won’t try to explain what neural networks are, but here is an excellent explanation of how they’re used for image recognition.)

So: algorithm, artificial intelligence, machine learning, neural networks. In popular cant, these refer to the same thing: computer programs that learn. And they learn by being fed training data, which we’ll get into more below.

There’s another important thing to know about these system: they are not, in any way, shape, or form, “intelligent.” They can do many things that intelligent beings can do, but they aren’t creative. Fundamentally, ML systems are about finding patterns, whether obvious or subtle.

These patterns come from the training data, which is the information initially fed into ML systems. Modern systems continuously update themselves. Every time you click on a link from, say, a Google search, the system updates itself: this link, for this person, is a more likely response to some particular query than is some other link.

These patterns don’t have to make sense to a human! Consider this selection of products below that Amazon once offered me. For the life of me, I can’t see any correlation between a phone charger, creatine, and a salt-and-pepper grinder set—but the Amazon algorithm found one.

A few of Amazon's suggestions for me.
Enlarge / A few of Amazon’s suggestions for me.

This is what’s important: machine-learning systems—”algorithms”—produce outputs that reflect the training data over time. If the inputs are biased (in the mathematical sense of the word), the outputs will be, too. Often, this will reflect what I will call “sociological biases” around things like race, gender, and class. This is usually unintentional, though malicious individuals can deliberately feed poisoned training data into ML systems (as happened to Microsoft’s Tay chatbot).

Even with the best intentions, eliminating sociological bias is very hard. Consider mortgage lending. I very much doubt that any major lender in the US is explicitly including race in credit-scoring algorithms (which would be highly illegal) or to help them “redline” neighborhoods (which has been illegal for more than 50 years). But there are plenty of other things beside race itself that correlate with race.

Amazon learned this the hard way when it tried using an internal ML system to screen resumes and found that the system became, in effect, biased against women:

Amazon’s system taught itself that male candidates were preferable. It penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter…

Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory, the people said.

This is the sort of thing that Rep. Ocasio-Cortez was discussing—despite being “just math,” the results of many “algorithmic” systems can be biased. Though this is probably not because of deliberate efforts by developers, it’s a problem nevertheless:

All of this is a remarkably clear-cut illustration of why many tech experts are worried that, rather than remove human biases from important decisions, artificial intelligence will simply automate them. An investigation by ProPublica, for instance, found that algorithms judges use in criminal sentencing may dole out harsher penalties to black defendants than white ones. Google Translate famously introduced gender biases into its translations. The issue is that these programs learn to spot patterns and make decisions by analyzing massive data sets, which themselves are often a reflection of social discrimination. Programmers can try to tweak the AI to avoid those undesirable results, but they may not think to, or be successful even if they try.

What should be done? The obvious answer is to design ML systems that explain the reasoning behind their outputs so that biased results can be found and problems eliminated. It’s a wonderful idea, but it’s beyond the current state of the art. “Accountable machine learning” is a hot research area, but it’s just that: research.

If a real solution to the problem doesn’t exist, what should we do now? One thing is to exercise far more care in the selection of training data. Failure to do that was the likely root cause of Google Images labeling two African-Americans as gorillas. Sometimes, fixing the training data can help.

Of course, this assumes that developers are even aware of the bias problem. Thus, another thing to do is to test for biased outputs—and some sensitive areas, such as the criminal justice system, simply do not use these kinds of tools.

On rare occasions, a simple fix is possible. In 2004, when Google had a problem with neo-Nazi sites showing up as high-ranked outputs for searches on “Jew,” the company internally purchased an “ad” for that keyword (which led to this explanation). In a nutshell, legitimate queries and pages tended to say “Judaism” or “Jewish;” bigots used “Jew” more.

(Why is this no longer the problem it once was? I’m not 100 percent certain, but my guess is that, at least in part, it was fixed in a different way. Google these days puts a lot of effort into trying to “understand” queries. One thing they do is “stemming:” reducing search terms to root words. Thus, searches for “Jew” and “Jewish” today both find the same documents.)

Given that ML systems (including facial recognition systems) can produce biased output, how should society treat them? Remember that, often, the choice is not between algorithmic output and perfection but between algorithmic decisions and human ones—and humans are demonstrably biased, too. That said, there are several reasons to be wary of the “algorithmic” approach.

GIGO

One reason is that people put too much trust in computer output. Every beginning programmer is taught the acronym “GIGO:” garbage in, garbage out. To end users, though, it’s often “garbage in, gospel out”—if the computer said it, it must be so. (This tendency is exacerbated by bad user interfaces that make overriding the computer’s recommendation difficult or impossible.) We should thus demand less bias from computerized systems precisely to compensate for their perceived greater veracity.

The second reason for caution is that computers are capable of doing things—even bad things—at scale. There is at least the perceived risk that, say, computerized facial recognition will be used for mass surveillance. Imagine the consequences if a biased but automated system differentially misidentified African-Americans as wanted criminals. Humans are biased, too, but they can’t make nearly as many errors per second.

Our test, then, should be one called disparate impact. “Algorithmic” systems should be evaluated for bias, and their deployment should be guided appropriately. Furthermore, the more serious the consequences, the higher the standard should be before use.

https://arstechnica.com/?p=1445853