Blindsight and (the Lack of) Artificial Meta-Cognition

In which we see what we cannot see

Jul 12, 2023

Free eye technology imagination illustration

If you were to ask me for my favorite book of all time, I’d give you a list of a dozen or so that changes daily. There are a few books, however, that will always end up on that list. One of those books is Blindsight by Peter Watts. (All of Watts’ stuff is excellent, by the way.)

Blindsight is the story of a highly enhanced human crew that is sent to check out a radio signal coming from a weird meteor. It’s not your average first-contact story, though. Not only is Watts’ style viscerally cynical and dripping with moments of nihilism; the book is also stuffed to the gills with brilliant scientific and philosophical ideas. For me, the book is one of the best and most rigorous treatments of consciousness — and I include academic nonfiction in that assessment. (It’s also one of the few fiction novels with a thorough scientific bibliography.)

I can only agree with author Elizabeth Bear’s review:

… rigorous, unsentimental, and full of the sort of brilliant little moments of synthesis that make a nerd’s brain light up like a pinball machine. But he’s also a poet—a damned fine writer on a sentence level, who can make you feel the blank Lovecraftian indifference of the sea floor or of interplanetary space with the same easy facility with which he can pen an absolutely breathtaking passage of description. His characters have personalities and depth, and if most of them aren’t very nice people, well, that’s appropriate to the dystopian hellholes they inhabit.

Anyway, I’m done raving about the book now, I promise.

I mentioned it here for its title. Blindsight is a term coined by neurologist Lawrence Weiskrantz in the 1970s to describe a very rare condition observed in some individuals who have suffered damage to their primary visual cortex, the region of the brain responsible for the initial processing of visual information.

As a result, patients with blindsight experience a condition known as cortical blindness, rendering them unable to consciously see objects in (sometimes specific parts of) their visual field. Herein lies the rub: despite their visual impairment, people with blindsight respond accurately to visual stimuli at levels far greater than what chance would suggest. Think Daredevil, but without the enhanced other senses.

There’s probably some brain rewiring behind this capability, which allows people with blindsight to pick up on some visual cues (their eyes and retinas function perfectly well). The crux is that whatever visual cue makes it to their brain; it fails to reach their conscious awareness. (Not everyone would agree with this, some researchers argue that blindsight is still conscious vision, but so degraded that people are not aware of it.) Whatever the case, people with blindsight react to visual stimuli without being aware that they’re ‘seeing’ them.

II.

Remember AlphaGo? In 2016, the AI AlphaGo beat 18-time Go world champion Lee Sedol 4 to 1. At the time, Go was thought to be out of reach for machine learning systems. Oopsie.

A lot has changed since 2016. Human players have been learning to use more of AlphaGo’s unusual moves. Machine learning, though, can move faster (in very narrow domains) than humans. KataGo, the current Go dominating AI, outperforms human players and the ancient AlphaGo alike.

And now, a preprint shows how a good human amateur can kick KataGo’s virtual behind. Here’s a piece of the author's summary:

Our attack achieves a 100% win rate over 1000 games when KataGo uses no tree-search, and a >97% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo — in fact, our adversaries are easily beaten by a human amateur. Instead, our adversaries win by tricking KataGo into making serious blunders.

One of the preprint’s authors is a good amateur Go player. He learned a few tricks and managed to beat KataGo 14 out of 15 times. The tactic involves creating a loop of stones to encircle the opponent's stones and then making moves in the corners of the board to distract the AI.

This tactic would be spotted easily by human opponents.

From the preprint:

This result suggests that even highly capable agents can harbor serious vulnerabilities.

KataGo suffers from blindsight. It responds to cues on the Go board without conscious awareness, and if we, meatbags, can figure out which cues lead to an outcome we prefer, the machine is helpless. (Of course, it isn’t quite that simple. Generative adversarial networks or GANs pit machine learning systems against each other to try and prevent this from happening.)

Another way to look at this, plucked from the study’s conclusion:

Our results underscore that improvements in capabilities do not always translate into adequate robustness. Failures in Go AI systems are entertaining, but similar failures in safety-critical systems like automated financial trading or autonomous vehicles could have dire consequences.

III.

There is an important difference between machine blindsight and human blindsight, though: people can be aware that they have it.

At this point, the blindsight analogy begins to feel forced, so let me suggest two important insights:

Any system that can perform complex cognitive/abstract reasoning tasks has bugs that can be exploited. (Complexity often leads to fragility rather than robustness, unless you divert a lot of resources to building in redundancy, but I need to think more about this. Do leave a comment if you are well-versed in complex systems stuff.) Humans are certainly not exempt from this.
Humans have meta-cognition. We can think about the way we think. I am not yet convinced that any AI system has this.

Yet, the potential importance of meta-cognition for AI that is worthy of the ‘I’ in its name is not new. In 2019, researchers from, among others, IBM suggested that a ‘meta-cognitive module’ might be required to let an artificial system switch between system 1 thinking (fast and reflexive) and system 2 thinking (slow and deliberate) - a switch that has been made very popular by Daniel Kahneman’s Thinking Fast and Slow. This meta-cognitive ability might even be crucial for engineering safe AI.

Of course, 2019 is centuries ago in terms of AI development.

If we look at GPT-3 and its friends, things get murky. On the OpenAI developer forum, there are suggestions to tweak prompts and models to get flickers of something that resembles a very rudimentary form of meta-cognition. Keword: resemble. A parrot that tells you “I’m human” is still a parrot. There are blog posts by web developers Luke Plant and Simon Wills on ChatGPT and Bard, respectively, that argue there’s no introspection or meta-cognition going on beneath the hood.

Let me give my own much simpler example. When you ask ChatGPT how it feels, you get this:

That kind of sounds like it's aware of its limitations. Here’s the kicker: it’s a programmed response, much like how you shouldn’t expect a response when you ask it about ‘forbidden’ topics like porn, abuse, and so on.

Nothing going on beneath the hood. All system 1.

Of course, we have to be careful not to get swept up in human exceptionalism. We’re not that special. Our own thinking systems have bugs and loopholes too.

I think.

Recent thoughts