Eliminating Bias in AI/ML
Using Bias to Bias Bias!
Welcome to Polymathic Being, a place to explore counterintuitive insights across multiple domains. These essays take common topics and explore them from different perspectives and disciplines and, in doing so, come up with unique insights and solutions. Fundamentally, a Polymath is a type of thinker who spans diverse specialties and weaves together insights that the domain experts often don’t see.
Today's topic engages the idea of bias in artificial intelligence (AI) and looks at how such a simple term actually isn’t and leads to a fundamental misunderstanding of how to achieve the outcomes we desire from AI.
Bias in AI/ML
Over the past years, there has been a steady conversation about bias in Artificial Intelligence and Machine Learning (AI/ML). The bias they refer to should be clarified as ethical bias. A perceived advantage or disadvantage of one group over another, specifically when potential negative impacts affect groups we consider disadvantaged due to race, socio-economic status, etc. Whether this involves Hiring, Facial Recognition, Sexism, or Racism, there is a non-stop hand-wringing that we need to take action to eliminate it. In fact, the tone of the conversation has escalated so high that this recent Forbes article characterizes it thus:
Efforts to fight back against AI For Bad are actively underway. Besides vociferous legal pursuits of reining in the wrongdoing, there is also a substantive push toward embracing AI Ethics to righten the AI vileness. [emphasis theirs]
AI vileness and AI For Bad? These words are designed to otherize and trigger disgust. It sounds as if either the coders or the AI itself has nefarious intent. The article is designed to elicit a reaction to ethical bias. But this entire conversation in AI/ML is missing a critical point: AI/ML is, in fact, intentionally encoded bias!
Before we start attributing language like Forbes let’s recognize that AI/ML is nothing more than coded algorithms that are provided with datasets so that it can literally discriminate patterns and output a reduced and targeted dataset that provides insights. If your AI/ML's bias, against the data, did not return the results you wanted was that an ethical bias, or a data accuracy/algorithm accuracy problem?
What is Bias?
This essay is going to be a bit of a brain twister because there are three different definitions, applications, and therefore solutions to the term bias. Because of this, a paradox emerges that to avoid perceived ethical bias, the solution is likely not to eliminate bias, but to actually bias the bias to achieve the desired outcomes. If this just gave you a brain cramp that’s OK. We’ll start to unwind this paradox by teasing out the three definitions, looking at them in context, and then demonstrating how they work together where eliminating bias in AI/ML might just involve adding bias. To start, here are the three main definitions of bias:
Prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair.
Deviation of the expected value of a statistical estimate from the quantity it estimates and/or systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others
An intentional injection into a system to adjust or align such as a voltage applied to a device (such as a transistor control electrode) and a high-frequency voltage combined with an audio signal to reduce distortion in tape recording
The issue with discussing bias in AI/ML is that it actually covers all three layers! The first definition is what I call ethical bias, the second is a measurement/data bias, and the last is a mathematical bias. Since we clearly want to avoid ethical bias while coding our data and mathematically biased algorithms, we will investigate the implications of all three in the context of accuracy starting with ethical.
I proffer the simple observation that the very fact that we are having ‘dilemmas’ about ‘ethical AI’ means that we don’t like the results. Just like Google’s facial recognition software which mislabeled humans as Gorillas was very quickly reversed, we should stop accusing AI/ML of ethical bias, and instead look at what it means from an accuracy perspective. We have to recognize that, fundamentally, AI/ML is stupid and has no cognitive capabilities to be biased according to the first definition, and any attempt to attribute that concept is anthropomorphizing algorithms. This means that a human has to deliberately encode AI/ML to be intentionally ethically biased. If you haven’t deliberately done this, then what we are seeing is a reflection of our data or our systems of prioritizing measurements.
This is where our AI is merely holding up a mirror to a couple of things. For example, in hiring, it highlights the value you put on certain terms, roles, education, etc. In fact, evidence from Amazon’s recruiting tool shows that it downgraded terms like ‘woman.’ But was this a biased AI/ML or was this reflective of an uncomfortable recognition that the measures of success at Amazon, as opposed to their stated intention, are based on traditional male-dominated attributes? When provided the data on patterns of hiring, the algorithm merely noted that hiring managers routinely downgraded resumes by those who could be coded as ‘woman’. The algorithm didn’t choose to downgrade the term itself, it merely identified an attribute that captured the general trend of hiring.
This was exactly what James Damore attempted to highlight in his infamous 2017 ‘Google Memo’ where his observation was that Google was designed by and for male software engineers and went on to proffer that women might not want to code under those measures of success. Instead of looking at the organization, Damore was pilloried but if we don’t challenge those underlying assumptions about what is valued and we train our AI/ML on those values, we will weight the algorithms toward that bias. Fundamentally, when an algorithm picks up the patterns of ‘success’ based on historical precedent, that isn’t bias in the algorithm, that’s a mirror toward the bias in the organization itself.
To summarize the implication of algorithms toward ethical bias I’ll just restate that the algorithms cannot be independently ethical since they are noncognitive agents. If they are not intentionally coded for unethical bias and we make every effort to eliminate ethical bias from AI/ML this means what you see are reflections of the values placed on success and/or the accuracy of the data.
Measurement / Data Bias
As we move beyond ethical bias and measures of success we can then look at data accuracy. This is where the term bias needs to be divorced from ethical considerations and fully focused on accuracy because here we enter the second definition of bias in how we collect, measure, and identify data which revolve around three prime examples.
Sampling bias: Where the data we collect does not reflect the environment intended. This is the classic garbage-in-garbage-out of data processing
Exclusion bias: The action of excluding or removing data points. This doesn’t have to be intentional especially if we hadn’t considered the value of other inputs
Labeling bias: In the world of big data, we group and cluster information under loose descriptors merely because we can’t logically process each piece of data as a unique entity.
Data accuracy boils down to a function of expectations about the value of data features, and then the actual accuracy, or completeness of the dataset. If your outcome isn’t matching your intent you should look at the accuracy of your data. We make decisions all the time about what data to use. We don’t have perfect or complete data, and even if we did we can’t process that much information to achieve meaningful outcomes. Therefore truncated data is chosen based on biases, to produce the most efficient outcome.
Some things to consider are: What data did you train on? Is the data reflective of the outcome you want, or the outcome you've had? Do you need to create synthetic data to teach what to look for? Do you need to clean your dataset, or clean it differently?
We make decisions all the time on these topics that have substantial accuracy implications. The application of any heuristic or assumption is the application of a biasing shortcut to compensate for data. Clearly, if the authors of the algorithm and the sponsoring organization didn’t like the result, the algorithm isn’t ethically biased, it just didn’t accurately drive toward the outcome they desired and that might be caused by data accuracy and completeness.
The third definition of bias, intentionally skewing weights toward a different outcome, ironically both provides a solution to the problem and raises its own ethical concerns. As seen in the recent lawsuits around Affirmative Action and universities, putting a finger on the scale of selection can quickly create its own ethical bias concerns while at the same time attempting to right a perceived bias.
Here’s where this topic gets crazy; the solution to the ‘bias’ of Amazon’s hiring engine wouldn’t be to eliminate the bias but to bias the bias by both understanding what the values you have placed on attributes are and what your data sampling provides to train the AI/ML on. This might mean refactoring the measures of success, biasing your data samples to train the AI/ML differently, and even using synthetic data to achieve the outcome you want.
There is another element here that often the hardest part of the conversation is having to face the realities we’d like to avoid admitting. This includes the similarity of human facial features to our primate ancestors (Google just stopped tagging anything as a primate), the fact that women have different styles, proclivities, and inclinations in how they write their resumes, or how certain socio-economic groups have different lending risks in aggregate, or how racial groups matriculate through our educational systems and traditional employment paths differently. This is neither an accuracy, nor a bias problem per se, but cannot be ignored as doing so would become its own accuracy and bias problem. Recognizing these differences helps you create improved mathematical bias to achieve the desired outcome.
Mathematical bias can only be accurately applied if we’ve fully understood the first two layers of bias in our organizations and in how we select our data. Only when these are approached openly, and honestly, can we begin to accurately apply countering bias to direct the outcomes. We also need to be cognizant of sensitive and insensitive measures in our models and not over-index on perceived ethical bias without first understanding in which layer, and on which value, that perception is emanating.
Using Bias to Bias Bias
How do we eliminate bias in AI/ML? Simple: We apply bias. As we’ve unraveled this paradox, it should become clear that we have to reconceptualize what AI/ML is actually attempting to do. These algorithms are not cognitive beings and so cannot act as ethical agents. They simply process mathematical equations. As such, the main goal of this essay is to show how we need to stop throwing accusations of ethical bias against algorithms.
Instead, we should understand the three definitions of bias and what this implies about what we are attempting to achieve. We may have to look at our organizational structures, the values we rotely accept, and the uncomfortable patterns we blindly ignore. We may have to consider our data sets and whether they are accurately representing both the inputs and the outputs we desire. We also need to carefully consider how we apply mathematical bias to filter out answers from large volumes of data. To restate: AI/ML is nothing more than human-coded mathematical bias on top of dirty data. Problems with the output are almost always accuracy problems, not ethical ones and by keeping these ideas straight, we can actually analyze how to achieve the outcomes we are going for!