Can We Engineer Wisdom Into Our Virtual Assistants?

Dusty foot philosophy, metacognition, and building moral foundations

Gunnar

Jun 26, 2024

! Warning: this one is longer and slightly more technical than the usual posts, so grab the popcorn and get ready to chew. Regular programming will resume soon.

In today’s essay:

the two-sided foundation of wisdom
thinking about thinking (fast and slow)
grounding morals in the real world

Wisdom’s two feet

In 2005, Somali-Canadian hip-hop artist K’Naan released his debut album, The Dusty Foot Philosopher, which has been academically recognized as a modern epic. The album details his childhood in a war-torn Somalia, his move to Canada, and his journey to establish himself as an artist.

An interlude reveals that the album title refers to a friend of K’Naan who did not survive Somalia’s civil war; a friend he calls the dusty foot philosopher.

He was like the dusty foot philosopher
It means the one that's poor
Lives in poverty but lives in a dignified manner
And philosophizes about the universe

This idea of the dusty foot philosopher is a reminder that philosophy is not only for the privileged few and the paywalled journals, but that it is the provenance of everyone who enjoys thinking — philosophy literally translates to ‘love of wisdom’, after all.

What is this thing called wisdom, though?

While wisdom is challenging to measure - there is no single consensus framework for doing so - a 2020 study surveyed an international group of wisdom researchers to look for points of agreement in the many definitions. They zero in on a common wisdom model based on two elements: metacognition (thinking about thinking) and moral aspirations (trying to be ‘good’).

This common wisdom model is further refined by adding sensitivity for culture and context, as well as by specifying that perspectival metacognition is the type of metacognition we’re after, characterized by epistemic humility, the ability to consider and balance multiple perspectives, and context adaptability.

From Grossmann et al. (2020) The Science of Wisdom in a Polarized World: Knowns and Unknowns. *Psychological Inquiry*. 31(2): 103 - 133.

Since major software companies are betting on the coming age of AI1 assistants, can we use this wisdom model to ensure our virtual assistant will be wise? Who wouldn’t want an Arete or a Socrates in their pocket?

Let’s consider wisdom’s two dusty feet: metacognition and moral foundations2.

Metacognition

Metacognition, or thinking about thinking, is not (yet?) something current large language models (LLMs) excel in.

Like wisdom itself, metacognition is a composite of different traits, traditionally subdivided into metacognitive knowledge and metacognitive experiences. The latter refers to the feelings and judgments one experiences during a problem-solving task. Even in humans, the problem of other minds is tricky, let alone in machine learning models, so we’ll return to the obscure lands of subjectivity in the next section (‘Moral foundations’).

For now, let’s stick to knowledge. Metacognitive knowledge splits into declarative knowledge (knowing what you know and don’t know), procedural knowledge (knowing how to do things), and conditional knowledge (knowing when and how to use what you know).

We might think that LLMs have a lot of declarative knowledge. After all, they have substantial chunks of the internet in their training data to consult. But there is a (big) problem: LLMs don’t (can’t?3) care about the truth value of their statements, and they often confabulate the reasoning behind their responses. For example, a preprint from a few weeks ago used a riddle to test the ‘common sense’ of state-of-the-art LLMs. Not only did the models often get the answer wrong, but they showed:

…strong overconfidence in their wrong solutions, while providing often non-sensical 'reasoning'-like explanations akin to confabulations to justify and backup the validity of their clearly failed responses, making them sound plausible.

You could argue that people are often wrong and fabricate explanations as well, but I’d reply that, except in the case of deliberate misinformation, people providing wrong information assume that they are truthful4. For current LLMs, the truth value of their statements is not one or zero, but null. They select the most suitable next token, regardless of whether it adds up to a true statement (though you could argue that it’s statistically more likely to be true than not, given good training data).

Procedural knowledge is more hands-on; it extends into the world. A 2008 paper on metacognition in natural and artificial systems suggests that affordances (how an object can be used within the constraints of the environment) and physical manipulation are important for procedural knowledge. Gary Marcus has already pointed out Sora’s surreal physics and whether or not LLMs have ‘world models’ (an internal representation of the external world) is up for debate. There are efforts to build LLM-based multimodal world models, but so far, any procedural knowledge LLMs can derive from next-token prediction appears to be highly domain-specific, based on simple, ‘encodable’ building blocks, and possibly computationally expensive.

In contrast, a few months ago, a study trained a machine learning model on the experience of a child (via head camera recordings of a toddler from the age of 6 months to just over two years) and the model:

… acquires many word-referent mappings present in the child’s everyday experience, enables zero-shot generalization to new visual referents, and aligns its visual and linguistic conceptual systems. These results show how critical aspects of grounded word meaning are learnable through joint representation and associative learning from one child’s input.

There seems to be something about being in the world and interacting with it (even if only by proxy) that facilitates the construction of world models required for procedural knowledge. (See also the recent suggestion to bring together robotics and generative AI to bust through plateaus in both fields through spatial intelligence.)

Finally, conditional knowledge is knowing when and why to use the other two types of knowledge. It helps answer ‘what if’ questions. In other words, conditional knowledge is (partly) about how fluently you can reason about counterfactuals. In this case too, we have reason to think that LLMs are not quite there yet. A recent preprint uses counterfactuals dissimilar from any pre-training data to test the generalizability of human and LLM reasoning abilities. In short:

… while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.

None of this implies that LLMs or other machine learning models are fundamentally incapable of attaining robust metacognition (but see footnote 3). So far, they are just not good at it. Can we make them better?

One route ahead borrows from Daniel Kahneman’s work on system 1 and system 2 thinking. Metacognition activates a reflective, introspective, (relatively) slow type of thinking. Recent work on artificial metacognition explicitly instrumentalizes this distinction between fast, intuitive, system 1, and slower, reflective, system 2 thinking to add a metacognitive component or module to AI architectures.

Illustration of a fast-and-slow thinking AI architecture. (From Bergamaschi Ganapini et al. 2023. Thinking Fast and Slow in AI: The Role of Metacognition. Proceedings of the Machine Learning, Optimization, and Data Science: 8th International Workshop)

Status of artificial metacognition?

It’s very early days, and generalizable artificial metacognition will likely require either advanced and flexible fine-tuning, the addition of ‘introspective’ modules, or something beyond the common AI architectures.

Why I might be wrong

In a recent paper with the evocative title ‘Shadows of Wisdom’, researchers finetuned LLMs to classify metacognitive and morally grounded elements in human narratives about workplace conflict. This classification was robust and congruent with human classification, even for few-shot models.

My (very brief) response

Recognizing and classifying metacognitive elements in a provided narrative is not the same as having access to those traits5. Seeing the shadow does not mean holding the object. Just ask (artificial?) Plato.

Moral foundations

In 1960, American computer scientist Norbert Wiener wrote:

If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it, because the action is so fast and irrevocable that we have not the data to intervene before the action is complete, then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it.

This is an early expression of the (in)famous alignment problem: if we ever manage to build superintelligent AI, how do we ensure that it does what we want it to do? The only truthful answer we can give is, “We don’t know.” Scenarios range from Hollywoodian robot uprisings, over paperclip dystopias, to the assumption that superintelligence comes with morality baked into it.

But.. whose morality?

The racism and sexism in LLMs are well-documented and they are at least partially due to the nature of the training data — relying on internet-derived datasets comes with representation of all the trolls and the underrepresentation of (groups of) people who are less visible online. For now, programmed guardrails are the often hastily assembled guardians of artificial morality. However, 1) it’s not difficult to jailbreak those guardrails, and 2) the people programming those guardrails are mostly WEIRD (Western, Educated, Industrialized, Rich, and Democratic). By extension, LLMs themselves are pretty WEIRD.

This suggests that top-down programming of morality is fragile and biased. What if we flip it around and look at it from the ground up?

Moral foundations theory, popularized by psychologist Jonathan Haidt in the book The Righteous Mind, tries to ‘ground’ human morality in five (later expanded to six) innate, cross-cultural moral dimensions: care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation (possible sixth: liberty/oppression). If those dimensions are genuinely cross-cultural, why aren’t all human moral systems the same? Because a human moral system is determined by the point in each dimension where actions become immoral. In other words, a moral system is defined by the ‘weights’ we give across and within the dimensions to different behaviors or expressions.

That sounds like something LLMs might be able to do.

In responding to prompts, LLMs appear to exhibit moral foundations along these dimensions, but these responses show signs of political bias and are easily changed via adversarial prompts. Variations in the prompt format “can greatly influence the response distribution” and different LLM models have different moral preferences. Even the language used in prompts can shift moral evaluations. Moreover, LLM-based moral judgments show:

… value misalignment for non-WEIRD nations from various clusters of the WVS [World Values Survey] cultural map, as well as age misalignment across nations.

Like the guardrail idea, relying on textual training data for establishing moral foundations results in a fragile and biased moral system.

Do we need something more for a contextually robust, human-aligned artificial morality? In one of the most influential papers on moral foundation theory, Haidt & Joseph write:

The hallmark of human morality is third-party concern: person A can get angry at person B for what she did to person C.

Or, morality with human-like ‘weights’ for the foundations requires a combination of theory of mind and empathy, both of which provide the perspectival element in the common wisdom model we started with (see also footnote 2).

Theory of mind, briefly, is understanding that others can have different mental states than your own.

Imagine person A walks into a room, picks up a ball, and puts it in a green box. Then, person A leaves the room. Person B enters, takes the ball out of the green box, and puts it in a red box. Person C observed the whole ordeal through a one-way mirror. Person A enters again. For person C to infer that A will open the green box to look for the ball requires a theory of mind — C needs to attribute the belief ‘ball is in green box’ to A, even though C knows that this belief is not true.

As it stands, LLMs (probably) don’t have a robust theory of mind. For example, a preprint that assesses some claimed success cases finds that trivial alterations to the theory of mind task result in failure. Brute forcing theory of mind by feeding the model ever more data is unlikely to lead to the desired result because raw input (text, video) doesn’t always translate into the same action. Complex biological organisms are not consistently rational actors. If I’m angry, I’ll act differently than when I’m calm, even in the same situation. There are no hard-and-fast rules for theory of mind; it develops over time, through interaction with others. A machine theory of mind is a multi-agent problem, which suggests one path to a potential solution is iterative interaction with (virtual or physical) others, coupled to an internal state simulator.

Internal states bring us to empathy. Even if we are not as pessimistic as the last paragraph and we grant LLMs the ability to attribute mental states to others, recent work argues that:

… LLMs are getting better and better at simulating empathy, making us feel seen and heard even if there is no one doing the seeing and hearing… Human judgments of character adopt the method of taking up another person’s perspective. This is not something that can be achieved by LLMs.

A 2023 study puts it bluntly:

Current approaches to artificial empathy focus on its cognitive or performative processes, overlooking affect, and thus promote sociopathic behaviors.

At the moment, affective computing focuses on recognizing emotions through assessing facial expressions or body language. But, like in the shadowplay from earlier, recognizing emotions does not equate to understanding emotions. Where theory of mind concerns knowing what others think, empathy is about relating to what others feel. Whether or not this is achievable via machine learning remains an open question. We scarcely know how qualia (subjective, conscious experiences) work in humans.

Both theory of mind and empathy share two crucial requirements, though: first, the ability to distinguish between self and others; second, the ability to form an internal representation (which need not be perfectly accurate) of the beliefs/emotions of those others, which - as I’ve written before - may require a physical or virtual form of embodiment (a delineation of self) and iterative interaction with others (social learning).

One argument to put this into practice is a mindset shift from programming to raising. As with the camera-wearing toddler we met in the ‘Metacognition’ section, some researchers argue that teaching AI as if it were a child, rather than trying to program everything top-down, is the key to endowing it with common sense. A brilliant take on this is Ted Chiang’s short story ‘The Lifecycle of Software Objects’, in which digital entities or ‘digients’ learn, grow, and develop their own unique personalities through interactions with humans and each other.

Status of artificial morality?

Uncertain. Current approaches appear limited and prone to error. Artificial morality likely requires internal representations of self and others as well as repeated interactions with other agents. Even then, we simply do not yet know if this will suffice.

Why I might be wrong6

Moral relativism: there is no such thing as a universal set of morals everyone agrees with. Moral values are context-dependent and they change over time. Therefore, trying to define moral foundations is a fool’s errand.
Feelings don’t matter: all this talk about feelings and subjectivity has no place in artificial intelligence. It merely muddies the waters of what we’re trying to achieve. Emotions are not all that important to intelligence or wisdom.

My (very brief) response

I am sympathetic to both these arguments, but…

If we take a strong view of moral relativism (which, I admit, is often a straw man argument), the alignment problem is fundamentally unsolvable. We could, on the other hand, recognize that, even though morality may be relative to a debatable extent, humans have learned to live together at very high densities without absolute mayhem. This suggests that it is possible to establish certain moral foundations to encourage peaceful coexistence.
There is no true empathy without inter-subjective feeling. There is no context-sensitive morality without empathy. There is no wisdom without context-sensitive morality. Beyond wisdom, I’d argue that emotions provide motivational salience. So, to achieve any kind of robust, independent intelligence, they may be crucial (though I admit my bio-chauvinist bias here).

Conclusion: more than (more) data?

The recently launched ARC Prize of $1,000,000 was inspired by the idea that the focus on LLMs might have stalled progress toward artificial general intelligence (AGI). Those LLMs themselves are seemingly hitting a wall too.

Is the solution throwing more data at the machine? Or is there something inherently lacking in current methods of data acquisition and internal representation?

No human being has access to the amount of bare-boned facts that a state-of-the-art machine learning model has access to. But, to use another cliché, quantity does not equal quality. Every second, a human being acquires multimodal data that is represented in a flexible internal world model. You’re reading a book, the wind kisses your skin, and your toddler sneezes. Machine learning has been limited to input in relatively few formats, (understandably) acquired in clearly defined ways. But what if something hides in the complexity and messiness of life that is crucial for human-like intelligence and wisdom?

Human wisdom is a combination of life experiences and the cognitive and emotional resources to contextualize those experiences. Keeping my biologist’s bias in mind, my conclusion would be that wisdom can’t be (fully) engineered; it needs to grow.

However, am I being too ungenerous in assuming that ‘human-like’ intelligence and wisdom are the goal? Artificial intelligence need not be human intelligence; artificial wisdom need not be human wisdom.

Perhaps the wisdom we seek from our near-future virtual assistants doesn’t have to include all the above. Perhaps artificial metacognition, theory of mind, or empathy need not be human-equivalent. Perhaps we will never know ‘what it’s like’ to be an artificial intelligence.

Perhaps it’s all about perspective.

You tell me.