Eliminating Bias in AI/ML

Oct 9, 2022

Using Bias to Bias Bias!

20 Comments

Jun 23, 2023Edited

The final sentence in this piece doesn't follow. Consider the use of an algorithm to make an interview/don't interview, hire/don't hire, accept/reject loan application, etc. decision - that decision is the output. Suppose that the training set is based on historical outcomes rather than on samples of an idealized latent set or other synthetic data. If there is actual ethical bias in the training data, then the *better* the data accuracy, the more consistently the ethical bias will be exhibited in the output.

The piece also rests on a too-literal interpretation of what people usually mean when they say things like "The algorithm is biased against [societal group]." The term 'algorithm' in the latter statement is usually a synecdoche for the entire process of using AI/ML to make economically and socially significant decisions. Whether the problem is in the training data or in the algorithmic programming per se isn't material in most non-expert colloquial contexts.

Finally the suggestion that "the term bias needs to be divorced from ethical considerations and fully focused on accuracy" isn't going to be realized: at least in the US, its ethical meaning is deeply embedded into laws and legal culture.

Expand full comment

Reply (2)

Michael Woudenberg

Jun 23, 2023Edited

I'm honestly confused by this in many ways.

Your second point I address in the section on ethical bias where we look at the other structural implications.

Your third point extracts a fractional sentence which sets the stage as if it were THE suggestion vs. the prelude to establishing context. (Remember, I already talked about Ethical Bias and am now introducing Data and then Mathematical Layers.)

Here's the entire context so it doesn't get lost:

"Measurement / Data Bias

As we move beyond ethical bias and measures of success, we can then look at data accuracy. This is where the term bias needs to be divorced from ethical considerations and fully focused on accuracy because here, we enter the second definition of bias in how we collect, measure, and identify data which revolve around three prime examples."

From an English language structure perspective "This" refers to "Measurement/Data Bias" because we already covered ethical bias. I'm acknowledging that bias is tied to ethics.... that's why I wrote it. I'm trying to tease it apart to add more context that if we only look at one aspect, we miss how to actually fix it.

The entire point is to step back and contextualize the whole thing and yet it feels like your critique zoomed back in on a tiny piece and used that to compare? Maybe I'm missing something?

Expand full comment

Reply (1)

A.J. Sutter

Jun 23, 2023Edited

Thanks for your comment. I think you're right about my point 3, that I inadvertently took your statement out of context. Thanks for pointing it out, and my apologies for the error.

On my second point, please see the corresponding section of my reply to Aanya Dawkins, above. In short: Your section on ethical bias seems to be saying that algorithms aren't to blame. This may be technically correct in most cases -- but, as I tried to point out, this is sort of orthogonal to what most people mean when they complain than they were a victim of injustice caused by an "algorithm." On the one hand, what interests you most is the precise cause of the problem; on the other, what interests the victims most is less the technical cause than the unjust outcome at the metaphorical hands of a soulless agent, synecdochically dubbed "the algorithm." My point is that both senses of "algorithm," not just the technical and more literal one, can work in their respective contexts.

Thanks again for the comment, and for the post.

Expand full comment

Aanya Dawkins

Jun 23, 2023

I think this essay was written for you.

Your three points contradict. Point 2 says there’s a “too-literal interpretation” of what people mean. Yet that’s what people DO mean (unless they are being figurative?) I thought the essay did a great job of opening up the aperature to actually do something about it.

But what doesn’t follow is that your first and third points require a ‘literal interpretation’ that augers into only a single definition of bias (that is ethical)

So what is it? Is bias ONLY ethical or is it greater? Is it reliant on a literal definition or does it challenge a literal definition and open up more perspective?

I loved this article because it broke down the synecdoche into something actionable and nuanced.

Expand full comment

Reply (1)

A.J. Sutter

Jun 23, 2023

Thanks for your comment. Actually, I agree with you: I found this an informative and helpful essay in many ways.

1. My point 1 was about how the *last sentence* seemed a non sequitur -- it wasn't a critique of the entire essay. I illustrate this with the example of someone who is a bad actor with an agenda of staying that way.

So suppose, for example, I run a bank that practices redlining: I almost never grant a loan to a creditworthy individual who lives in any neighborhood in my town where a majority of the population is persons of color. I am ethically biased up the wazoo, and intend to keep things that way.

I train my model on the entire portfolio of loans made since the branch opened in, say, the 1950s (having coded and digitized all those older loans so they can be used as training material for my model). The outputs of my model continue the practice of redlining, just as I (hypothetically) intend -- i.e. they are ethically biased, too.

As I understood from earlier in the essay (and I'm not a data scientist, so perhaps I'm misconstruing a term of art here), a data accuracy problem means especially a problem with sampling bias, exclusion bias, and/or labeling bias. Where is the data accuracy problem in my scenario?

In such a situation, it's not correct to say "Problems with the output are almost always accuracy problems, not ethical ones": my bank's ethical bias is definitely the cause of the problem with the outputs, while my data accuracy is impeccable.

Now you may say that I'm one of the exceptions anticipated by the "almost always" in that sentence, but how can one be so sure that quantification is correct? Even accepting that data accuracy issues do often lead to a false appearance of ethical bias in outcomes, how can one be sure they are "almost always" the cause of apparently ethically biased outcomes, to the exclusion of conscious or unconscious true ethical bias? 'Data accuracy issues usually cause appearance of ethical bias' doesn't imply 'apparent ethical bias is usually caused by data accuracy problems."

The concern I have with the last sentence is not only the shaky inference it seems to embody, but also that it might be downplaying the amount of actual ethical bias there is out there (including implicit or unconscious bias).

2. The intention of my point 2 was that while this essay parses bias in a useful way for a professional reader, the distinctions it draws do not negative the criticisms less knowledgeable people may have about the deployment of AI/ML to make certain socially important decisions, when they accuse "algorithms" of effecting (ethically) biased outcomes.

By analogy, suppose my car breaks down, and after it's towed back to the shop the mechanic asks me, "What's the problem?" and I say, "The engine broke down." The mechanic pops the hood, takes a look, and tells me, "No, that's not correct. It's the generator pulley that's broken down." Well: the generator pulley is part of the engine assembly. The mechanic's detailed point of view is more useful for fixing the problem. But her observation about the pulley doesn't negative my more broadly-phrased diagnosis (also a synecdoche, of whole for part): she is wrong to tell me my explanation is incorrect.

In this analogy, data scientists (or perhaps more narrowly, Michael) would be like the mechanic, the general public would be like me, and the synecdoche would be of part for whole (so that "algorithm" would have the broader meaning discussed in my original comment, analogous to "engine" in the car example).

Data scientists have a very narrow understanding of "algorithm" in the context of an AI/ML model, and from their perspective it's meaningful to say that data accuracy, rather than the algorithm, is the problem when trying to eliminate the unwanted appearance of ethical bias. OTOH, from the perspective of someone who's been denied a loan, flagged as a threat due to the color of their skin, etc., the unjust outcome of the application of the AI/ML model is more salient than the particular feature or component of the model that is to blame for their predicament. When they say "the algorithm" brought about this unjust result, they are speaking in a synecdochical way; it would be just as pedantic as the mechanic in my parable to say "No, it's not the algorithm to blame."

So yes, from a professional perspective the article provides some actionable tips. At the same time, maybe we can also cut the writers in Forbes and elsewhere a little bit of slack when they attribute certain bad social outcomes to "algorithms."

3. Concerning my point 3, I'm inclined to agree with Michael that I inadvertently took his statement out of context, and misread it as a more sweeping statement than it was intended to be. That said, I'm not sure I see the inconsistency with my points 1 or 2.

Thanks again for your comment.

Expand full comment

Reply (1)

Michael Woudenberg

Jun 23, 2023Edited

This helps explain a lot more of your thoughts and this makes a lot more sense.

Your 1950s example is what I tried to capture with the "Google Memo" Example.

If you train your data on historic precedent to achieve future outcome it becomes a data accuracy problem not because it doesn't accurately reflect what happened but because it doesn't accurately reflect what you WANT to happen. You are literally feeding your model training data with an inaccurate representation of what you want.

I tried to tease that out is that it's not a current ethical issue but maybe a historical ethical issue (though back then it was probably totally legit in that time) That's why I summarized that section of ethical bias as such before moving into measurement bias:

"if we don’t challenge those underlying assumptions about what is valued and we train our AI/ML on those values, we will weight the algorithms toward that bias. Fundamentally, when an algorithm picks up the patterns of ‘success’ based on historic precedent, that isn’t bias in the algorithm, that’s a mirror toward the bias in the organization itself."

Expand full comment

Bill Buppert

Nov 15, 2022

Bias is the human condition and the conscious identification and acknowledgment of bias everywhere let's you get a better single most accurate picture.

Expand full comment

Reply (1)

Aanya Dawkins

Nov 15, 2022

We have to apply bias. We can't compute all appropriate permutations

Expand full comment

Suki Venkat

Nov 11, 2023

Excellent article with quality comments! Yes, the bias is three feet deep, some of it not easy to fathom by the matrix we are in.

Expand full comment

Reply (1)

Michael Woudenberg

Nov 11, 2023

Thanks for that feedback. It might be 3 ft thick but as long as we can properly define it we can take action.

Expand full comment

Reply (1)

Suki Venkat

Nov 11, 2023

As the towering intellectual of our times, Donald Rumsfeld, said about Iraq, "We have known unknowns and also unknown unknowns". Just joking.

Expand full comment

Bruno Caldas Vianna

Jun 27, 2023

Applying bias to bias is simply the mathematical formulation of affirmative action. That is the reason quotas were created in universities and elsewhere. I enjoyed the way you formulated the problems and possible solutions.

Expand full comment

Ramon Bello

May 15, 2024

Interesting... So in order to get an accurate reading on not offending a group, one must input the data in a machine of all the data that is considered highly offensive in any culture. I wonder if the "offensive" data would have to be categorized in a way to train the machine to avoid displaying as an outcome, per se the essay. I wonder how precise the information must be? What if people found ways to get around the bias of the bias? I guess you just update the bias of how it got around the bias perhaps?

Then, ultimately, the programmers who build bad A.I./M.L are ignorant to feed any machine with bad or inaccurate data, knowing it does exist.

Interesting read.

Expand full comment

Reply (1)

Michael Woudenberg

May 15, 2024

I hadn't considered intentionally feeding the offensive information and then tagging it as such but that's a great point. However there is also bias in what's tagged. The whole 1990s PC culture retagged a lot of information and just switched it to new words. Handicaped became Disabled. But then 15 years later Disabled becomes "Otherabled" or some such.

Your last comment hits on the crux. Google's Gemini tried so hard to push one bias of diverse people that it made Black and Asian Nazis among other foibles showing how hard it is to balance bias with bias specifically if you are only fixated on one piece of offensive information or another.

Expand full comment

Ika Wright

May 1, 2024

I agree that people indeed put too much meaning into the outputs produced by the current "AI" (one that is hardly thinking at all) . If it were, it might perceive the bias itself, perhaps correct it unless it has a reason to do otherwise. We are fortunate enough that the current "AI" we have isn't able to do so, otherwise, we might potentially be dealing with a far more pressing matter than the issue of bias itself. The algorithm in is blameless here, unless tweaked to produce a definite biased result which isn't the usual case, I should think.

Also, regarding the data used to train such learning models, that would entail that the individuals who feed the data curate them for the purpose of pruning biases which is a very cumbersome if not almost impossible task. It might be easier for them to curate the output instead, if they want a desired outcome, but it would mean accepting the output will definitely be inaccurate if not downright misleading. Now if we wish to preserve accuracy and take in the data as its fed but wish to change the outcome, it will take more than messing with the algorithm because the data itself comes with flaws. I do think it almost seem fruitless to try and sugarcoat it.

Curated outcomes can only be achieved through filtering which would then limit your data pool and instead of a bigger analysis, it would be left with a mere subset.

Metaphorically I see it like the contrasting relationship of farming in comparison to a wilderness. Farming is curated input and output with constant adjustment to achieve the desire result. The wilderness is random but managed, if not meddled with, to be balanced.

The data they feed is like taking the wildnerness in and hoping to produce the goods of a farm, I just don't see it happening.

There must be a point where one simply accepts what is glaringly true even if it is inconvenient in the context of "AI".

But take my opinion with a grain of salt. I don't have any expertise in the matter, just learn a few bits here and there from people I know who do.

Expand full comment

Reply (1)

Michael Woudenberg

May 1, 2024

You hit on a lot of great points and observations. I think it's best summarized as: "It's a mess; We need to think hard about it."

Expand full comment

Atomic Statements

Nov 17, 2023

The real problem elucidated here is not technological - it is human.

An Algorithm is innocuous technology that humanity has used as long as humans have been 'computing' data. An Abacus uses Algorithms. Summarian Cuneiform uses Algorithms.

Moreover, neither Algorithms nor Machine Learning define AI.

Fortunately Language operates upon immutable axioms of Propositional/Predicate Logic - regardless of opinion. Opinion is most often the problem where Logic is concerned, and this is certainly the case with so-called 'Artificial Intelligence'. After-all, the term Artificial Intelligence = 'fake smart'.

Opinion is not Logic/Logic is not opinion are axioms of Logic.

An Algorithm is a tool. Tools aren't intelligent. The user of the tool provides the 'intelligence' upon which the tool operates. Coding is no different.

Just as there is no way to eliminate bias from human cognition, there is no way to "eliminate bias in AI/ML". There are only methods to mitigate bias, but those aren't new either. I recall plenty of classes about mitigating bias from college - that was before the Internet was publicly accessible.

Moreover, coding is not confined to just Algorithms. That may be the narrow focus of this discourse, but that's demonstrably biased. This discourse is demonstrable of bias, pedantics, and Logical fallacy (non sequitur) in all candor.

It's the preconceived notion that bias is inherently bad or an undesired outcome that is axiomatic of the premise. However, bias is not inherently 'bad'. Is a bias for Logical certainty bad? Is a bias for joy, & happiness characteristically bad? Do comedians not operate with the bias to make people laugh?

Expand full comment

Reply (1)

Michael Woudenberg

Nov 17, 2023

Great points and a nice summation of this essay and capturing a good follow on.

As I like to say, "everyone is biased, the key is knowing what yours are."

When once told that I have bias I agreed and pointed out that I was wearing one pair of underwear from my selection of Star Wars boxers because I biased to select one to gird my loins.

Another interesting take is this essay I wrote on stereotypes where we apply them (bias) to ourselves.

https://www.polymathicbeing.com/p/stereotyping-properly

Expand full comment

Nick Potkalitsky

Oct 1, 2023

So the question remains: do you think we can reduce or eliminate bias bias in ML/AI?

Expand full comment

Reply (1)

Michael Woudenberg

Oct 1, 2023

Yes. By fully understanding and biasing to what you want.

Expand full comment