6: A Few Questions on AGI
Popperian algorithms, evolvability, explanations, and more.
The fun of research is as much in the questions as the answers, so I figured I’d share some of my latest ones!
What are the limits of biological evolution?
What makes explanatory knowledge special?
How can Popperian epistemology improve narrow AI algorithms?
What are the different kinds of conflicts between ideas?
Why is the brain a network of neurons?
How does the low-level behavior of neurons give rise to high-level information processing?
1. What are the limits of biological evolution?
Could evolution ever produce a wheel-and-axle? Perhaps such inventions are “all or nothing”, and are therefore impossible to create in a sequence of tiny, trivial genetic mutations. It’s a counterintuitive fact that evolution can produce stupendous complexity (if you doubt this, then try 3-D printing a hippo…), but it cannot produce even a simple thing if it is too different from what already exists. Evolution is a bit like a mountain climber that can go high as the sky, but never jump a gap. (By the way, I suspect today’s neural networks have similar limitations.)
This question about limits is interesting because there is something special about humans - something we can do that evolution can’t. But what? While we know something of the answer, our knowledge is vague. Creating artificial general intelligence may depend on a more precise understanding of the boundary between the capabilities of human minds and biological evolution.
One could call this the question of evolvability - what can be evolved and what can’t be? Incidentally, this question - about what is possible and what is impossible - has the same form as statements in constructor theory, which is about what physical transformations are possible and impossible, and why. Perhaps the limits of evolution can be expressed in constructor theory, or illuminated by ideas from the constructor theories of information, life, and thermodynamics.
2. What makes explanatory knowledge special?
In his 2012 article, Creative Blocks, David Deutsch argues that “the ability to create new explanations is the unique, morally and intellectually significant functionality of people (humans and AGIs)…” He elaborates on this in The Beginning of Infinity and his TED talk on explanation, but we have a long way to go before our understanding is sufficient to create a computer program capable of explaining anything.
In a recent post, I asked:
Explanations are about what is objectively true rather than only what is useful. What are the consequences of this? What makes this sort of knowledge more powerful, and indeed more useful, than other kinds? What is different about its structure? What mechanisms of variation and selection are required to create this sort of knowledge? What is so difficult about creating it? (After all, it’s an extremely recent innovation in the history of life on earth.)
In my last post, True vs. Useful, I tried to explore one feature of explanations that makes them especially powerful:
In the end, the search for truth entails the pursuit of logical consistency among all our ideas, and thus takes advantage of all our knowledge - not just a single, fixed idea. It subjects our ideas to a powerful form of selection - logical contradiction - not found in biological or machine learning systems. Most importantly, it provides a combinatorial explosion of opportunities for conflict - and thus for progress.
On a different note, what makes explanations so hard to create? One idea, as Deutsch argues, is that good explanations are “hard to vary,” meaning that most modifications not only make them worse, but completely non-functional - unable to explain anything at all. This makes them hard to reach in a way much like “all or nothing” ideas like the wheel-and-axle.
To visualize this property, imagine a vast cube representing the space of all ideas, where the best - the most true and useful - ideas are bright points of light while the worst ideas are invisible. In this space, good explanations are solitary, bright points. They are rare, sparse, and disconnected from other bright regions. On the other hand, useful rules of thumb exist as fuzzier clouds of points, for similar rules of thumb are about as good as one another, and so the nearest points to a given rule of thumb are about equally bright. Similarly, genetic knowledge forms a fuzzy tree without any gaps - for any two points in the tree, you can get from one to the other by following a path of bright points.
Now imagine you’re trying to navigate this space to improve your ideas, but can only see a short distance. If you find yourself in a fuzzy cloud or tree, you can look around to nearby points to see if any are better and brighter, and move to it. By contrast, finding a good explanation is far more difficult, for it is hidden in a vast space, and you might pass within a short distance of it without ever seeing it. Stumbling and looking around isn’t enough. You need something more like high-powered telescopes and ultra-accurate, long-distance teleportation in the space of ideas. How do we do that?
3. How can Popperian epistemology improve narrow AI algorithms?
In his 2012 article, Creative Blocks, David Deutsch argues Popper’s work on epistemology is key to building artificial general intelligence. I think it may also inspire unique advances in narrow artificial intelligence algorithms (which, despite their lack of generality, can still be tremendously useful). After all, Popper’s work applies to the creation of knowledge in all its forms, from biological evolution to human minds - and narrow AI.
Also, the whole point of AGI is to write a program that is a mind, so the earlier one can apply and test one’s theoretical ideas (by programming them), the better. Such tests are bound to uncover all manner of subtle theoretical issues that would otherwise go unnoticed. That’s how programming usually goes - getting one’s ideas to work in practice is harder than anticipated, and leads to a far better understanding of things.
One idea is to apply the conclusions of True vs. Useful, and focus on solving constraints rather than maximizing performance. For a visual metaphor, it’s like trying to get puzzle pieces to fit together rather than walking to the top of a hill. While this approach has a long history (e.g. logic programming in Prolog), logic-based approaches to artificial intelligence have been mostly unsuccessful. They are brittle and full of precise statements, while human knowledge is flexible and full of fuzzy statements. So, there are unsolved problems here, and perhaps deep learning and Popperian ideas can help address them.
For one thing, I think a common mistake in the history (and present) of AGI research is to take a particular cognitive tool like logic, language, or analogy and suppose it is the core of intelligence. As Popper explained, variation and selection are at the core, and other things just provide specific (and often tremendously useful) mechanisms of variation and selection. Perhaps taking this seriously will help solve the problem of how to use logic in artificial intelligence - both narrow and general.
4. What are the different kinds of conflicts between ideas?
A key part of understanding minds is understanding how ideas interact within them. How do these interactions lead to the variation and selection required for the evolution of knowledge? How are they combined and altered to form new ideas? How do they exert selection pressure on each other? What kinds of interactions spark the search for new ideas?
As I argue in True vs. Useful, logical contradiction offers one example of how ideas can interact, but a subtler example might be when you are surprised by something. In this case, it’s unlikely there’s any explicit logical contradiction at work, but there is still a conflict of ideas. If you are surprised upon entering an elevator containing three goats and a glowing block of uranium, you are experiencing a conflict between what you expected to see and what you are actually seeing.
Something similar must be happening when you find something interesting. Here, the conflict is even more subtle, though, and I don’t quite understand it. Presumably an idea appears interesting if it seems both novel and relevant to problems that one cares about. Given the vagueness of such a statement, it can no doubt be improved upon by trying to program it, as I mentioned earlier.
At any rate, the interactions between ideas are fundamentally important, conflict is one key example, and it exists in many different forms which have evolved for different purposes, like finding things dangerous, desirable, interesting, and surprising.
5. Why is the brain a network of neurons?
There are many ways to build a computer, and the only fundamental requirement is that it be Turing-complete. While brains and modern processors both satisfy this requirement, one does it with a network of neurons and the other with a von Neumann architecture.
Why the difference? Presumably because brains had to be evolvable while modern computers could be designed (see question #1 above). A network of neurons can start small and grow larger in the course of evolution and be useful at each stage. In contrast, modern computers are like the wheel-and-axle. They’re all-or-nothing. If one part of the system breaks, or has yet to be created, then it is useless.
Setting aside the question of evolvability, though, should minds be made of networks of neurons (or simulations of them)? While the way a computer is built doesn’t affect what it can do in principle, it does affect what it can do in practice. After all, the integrated circuits in modern computers are millions of times faster than their vacuum tube predecessors. Moreover, there are different algorithms for doing the same thing, and they can be wildly different in their speed and memory usage. Perhaps the network structure of the brain indicates that minds depend for their efficiency on concurrent, distributed, networked computation.
For example, consider how efficiently the brain can search its memory in the course of a conversation. I’ve sometimes wondered how it was that I recalled a perfectly-apt anecdote despite having not thought of it for years. Evidently, it was stored in such a way that, under the right circumstances, it could quickly and easily be found and shared. That is not a trivial feat, given the vast collection of memories in a mind. For instance, it would be far too slow to go through all one’s memories one by one. By the time you’d found a good story, the conversation would be over! The network structure of the brain seems to handle the problem with ease, though. The general picture (as in deep learning) is that a situation “activates” some neurons, which in turn activates others which are connected, and this cascade of activity can eventually activate a region of the brain associated with some long-dormant anecdote that’s perfectly relevant to the conversation one is having.
So, if that’s one example of the efficiency and practical value of network-based computation, what are others?
6. How does the low-level behavior of neurons give rise to high-level information processing?
Human brains must be Turing-complete (after all, they came up with the idea of Turing-completeness!) but how does one build a universal Turing machine from neurons and their connections? This is an active area of research (here’s one potential explanation).
More generally, for any given computation, how can it be expressed in terms of the behavior of neurons? The same question exists for modern computers, too, but instead of expressing things in terms of neurons, one uses the low-level instructions which a computer processor offers. Historically, engineers hand-coded these low-level instructions, then developed a slightly higher-level language to make things easier. They could write code in this language, and it would be translated, or compiled, into the relevant low-level instructions. Later, other languages were built on top of that language. This process has continued, and now modern programmers can express high-level ideas easily and then compile them into the instructions which a processor can understand and execute. Perhaps a similar process happened historically with brains, and can be used in any network-based AGI computer we wish to build.