Big Week in AI: Baby Steps of a Giant?
The last week we have seen impressive improvements in machine learning
In the past week or so (okay, a little more than a week, I had to find time to write this), a few impressive advances in the field of machine learning* have been made public. Every time this happens, we are greeted by various opinion pieces opining that true AI is not too far off. Fortunately, there are more balanced assessments as well. Artificial general intelligence (AGI) always seems to be a decade away.
*AI and machine learning are not the same thing, even though they often get conflated. Paging Wikipedia:
ML learns and predicts based on passive observations, whereas AI implies an agent interacting with the environment to learn and take actions that maximize its chance of successfully achieving its goals.
Narrow AI, the thing we have today, can be - and often is - based on advanced machine learning systems. As a result, the terms are used interchangeably in a lot of the popular media pieces you’ll read. So when I use the term AI below, feel free to mentally translate that into ‘complicated machine learning system that does funky things with a big heap of data to get an output/prediction.’
Still, AI is a bit of a buzzword these days. It gets clicks. In reality, even though some advances are impressive (especially those below, I’ll get to ‘em soon, promise), it’s basically fancy statistics on a mountain of data. Current AI systems tend to have a relatively narrow focus, require a lot of training and computational resources, and they can be thrown off guard by a wrongly placed pixel. Another big challenge for these systems/models is context-dependency, inferring causation, and learning from limited examples/samples. (None of this, of course, discredits the hard work of people in the field or the necessity of thinking long and hard about the ethical/societal ramifications of machine learning.)
Anyway, on to the cool stuff.
What you say?
Natural language is one of those areas in which people expect big leaps made by machine learning. After all:
Games. One of the first things machine learning captured our collective attention with was games. From chess over Go to Starcraft. Give an AI a bounded world with a set of rules, and hand over the medals. If we go full Wittgenstein, we might consider language as a game too, or - more accurately - a family of games. Of course, human natural language is more complicated than the games we intuitively consider ‘games’. There’s context-dependency, ambiguity, combinatorial complexity, and so on. But, to aid our language-parsing AIs, we now have:
Data. It’s not a coincidence that this latest advance in language models comes from Google. Let’s say you have access to the world’s most popular search engine that receives billions of queries daily. Ooh, and let’s add a popular browser in which a lot of people follow up on those queries or visit other websites, including social media. Hello, data overflow.
The result? Here’s a tweet:
Mansplaining? We now have an AI for that.
Okay. I like it. Picasso.
AI leap number 2 is DALL-E 2 by OpenAI. This is an image generator based on natural language input. Or, you describe the image you’re thinking of, and DALL-E 2 makes it for you. The system is not yet open to the public, but the images floating around on the internet are pretty impressive.
Here too, data is the key resource. In this case, image data. Thank goodness we put everything on the internet these days. Add pattern recognition (something machine learning excels at), and, in the words of OpenAI:
DALL-E 2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.
Another tweet for you. (Can you tell I like the tweet embedding functionality of Substack?)
But what do you really mean?
Guess where advance 3 of the big AI week comes from? Yep, Google. Being the biggest boat in the data ocean has its perks.
The philosophical voice in my head really likes this one: Socratic machine learning models. (You can find the paper and the code here.) Take a model trained on visual data, another one trained on audio, another on text, and make them talk to each other. Or:
…we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -- in which new multimodal tasks are formulated as a guided language-based exchange between different pre-existing foundation models, without additional finetuning.
For example, you vlog a day in your life. (Who doesn’t want to be a Youtube celebrity, right? Also, so-called egocentric video was one of the model’s test cases). In the evening you suddenly realize you can’t remember where you’ve put your keys. Instead of combing through hours of video footage, you simply ask AI Socrates (SocrAItes?) where you left your keys. Artificial Socrates then combines input from a visual model (what do keys look like?) with input from a language model (what does the question mean?) and an audio model (who’s talking to me?). Interestingly - and perhaps logically? - the exchange between these models is language-based. Wittgenstein would be pleased.
Tweet incoming.
Skynet, are you there?
Now, I’m curious. (On brand, isn’t it?)
Do you consider these models ‘intelligent’? Why (not?)
Do you think true AGI - a ‘thinking’ machine, self-aware, with flexible abstract reasoning skills it can implement in various contexts with minimal explicit training - is possible? Why (not)?
The DALL-E-2 models are very interesting and provide a different path to computer creativity and understanding. But I believe all of these "deep learning" approaches will ultimately fail in that they will not lead to AGI. Why? Essentially without a body that interact with a world through its 5 senses these models will not be able to develop commonsense and thus develop intelligence, as intelligence requires understanding of the world. Melanie Mitchell, Sante Fe Institute scientist, gives the following example to illustrate this: ‘‘Consider what it means to understand ‘The sports car passed the mail truck because it was going slower.’ You need to know what sports cars and mail trucks are, that cars can ‘pass’ one another and, at an even more basic level, that vehicles are objects that exist and interact in the world, driven by humans with their own agendas.’’ Now true believers, on the other hand, argue that given enough data, a purely statistical approach (i.e., large language models) can develop a common-sense understanding of the world. As you state, AGI is always just 10 years away. Gary Marcus, professor emeritus of NYU, has argued for ‘‘a coordinated, multidisciplinary, multinational effort’’ modeled after the European high-energy physics lab CERN, which has successfully developed billion-dollar science projects like the Large Hadron Collider. ‘‘Without such coordinated global action," Marcus "thinks that A.I. may be destined to remain narrow, disjoint and superficial; with it, A.I. might finally fulfill its promise."