Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies is a book about artificial intelligence that examines some of the fundamental challenges that confront us as AI systems continue to be developed and become increasingly intelligent.
This article will not review the specific paths to development and risk that Bostrom analyzes (those interested may consult the book). Instead, the aim here is to discuss some of the book’s main themes.
Superintelligence: Paths, Dangers, Strategies – Definition of Superintelligence
Bostrom’s book focuses on the possible future emergence of superintelligence. This could occur through different means – but one of the more likely paths is through AI systems.
Bostrom understands machine general intelligence to involve machines matching humans in intelligence, possessing “common sense and an effective ability to learn, reason, and plan to meet complex information-processing challenges across a wide range of natural and abstract domains.”
Superintelligence, whether or not it emerges from intelligent machines, is understood as involving intellects that not only match but “greatly outperform the best current human minds across many very general cognitive domains.”
Bostrom considers three bundles of intellectual super-capabilities that he counts as superintelligence.
- Speed superintelligence: A system that can do all that a human intellect can do, but much faster. This means being able to do these things multiple orders of magnitude faster, such as reading a book in a few seconds or writing a PhD thesis in an afternoon.
- Collective superintelligence: A system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system. Ordinary examples of collective human-level intelligence already exist, such as firms, work teams, gossip networks, advocacy groups, academic communities, and countries.
- Quality superintelligence: A system that is at least as fast as a human mind and vastly qualitatively smarter. For example, nonhuman animals lack complex structured language, are capable of no or only rudimentary tool use and tool construction, are severely restricted in their ability to make long-term plans, and have very limited abstract reasoning ability. These are differences in intelligence quality from human beings.
Anthropomorphism and Superintelligence
The possibility of the emergence of superintelligence seems not, as Bostrom points out, to have seriously occurred to AI pioneers:
It may seem obvious now that major existential risks would be associated with such an intelligence explosion, and that the prospect should therefore be examined with the utmost seriousness even if it were known (which it is not) to have but a moderately small probability of coming to pass. The pioneers of artificial intelligence, however, notwithstanding their belief in the imminence of human-level AI, mostly did not contemplate the possibility of greater-than-human AI. It is as though their speculation muscle had so exhausted itself in conceiving the radical possibility of machines reaching human intelligence that it could not grasp the corollary—that machines would subsequently become superintelligent.
As Bostrom emphasizes, it would be a mistake to assume that artificial intelligence in general resembles the human mind. They have different cognitive architectures, different profiles of cognitive strengths and weaknesses, and different goal systems.
Just as it is a mistake to anthropomorphize artificial intelligence in general, Bostrom argues, it is a mistake to anthropomorphize potential AI superintelligence. This encourages expectations about the growth trajectory and capabilities of mature superintelligence. According to Bostrom, the tendency towards anthropomorphizing can lead to underestimating the extent to which a machine intelligence could exceed the human level of performance.
Superintelligence Goals and Motivation
Bostrom introduces what he calls the orthogonality thesis:
Intelligence and final goals are orthogonal: more or less any level of intelligence
could in principle be combined with more or less any final goal.
For example, just because someone or something is very intelligent, it does not follow that it will act in ways that we would approve of.
Beyond this, even if human programmers specify the goals at the outset, it could be difficult to understand or predict the paths that superintelligence might take in reaching those goals. This is because of the ways in which the goal systems differ from the ways in which humans understand goals. Consequently, there can arise what Bostrom calls “perverse instantiations” in which the system finds a way of satisfying the criteria of the final goal that violates the intentions of the programmers who defined the goal. This becomes an issue for human beings if the superintelligence has by then obtained what Bostrom calls a “decisive strategic advantage.”
This is part of a general and very wide-ranging problem that is currently left to programmers. For instance, suppose that a programmer believes that they have discovered a way of specifying a final goal that is not susceptible to a perverse instantiation:
Perhaps it is not immediately obvious how it could have a perverse
instantiation. But we should not be too quick to clap our hands and declare victory. Rather, we should worry that the goal specification does have some perverse
instantiation and that we need to think harder in order to find it. Even if after
thinking as hard as we can we fail to discover any way of perversely instantiating
the proposed goal, we should remain concerned that maybe a superintelligence
will find a way where none is apparent to us. It is, after all, far shrewder than we are.
Given the nature of the technology, Bostrom suggests, intellectual humility seems the most appropriate attitude:
The claim is that it is much easier to convince oneself that one has found a solution than it is to actually find a solution. This should make us extremely wary. We may propose a specification of a final goal that seems sensible and that avoids the problems that have been pointed out so far, yet which upon further consideration—by human or super-
human intelligence—turns out to lead to either perverse instantiation….
This creates a problem of control, which has to do with how to constrain, from the outset, the behavioral paths that might be taken by a future superintelligence in pursuing goals that are set for it. As Bostrom points out, it is essential that this problem be solved before, and not after, the possible emergence of superintelligence with a decisive strategic advantage because (it is presumed) there is a single opportunity to get the goal specification right.
Problems of Value
The next point that Superintelligence: Paths, Dangers, Strategies makes is equally important: even if the problem of control does get solved, there remains an important human question of which values are incorporated into the final goal. He writes:
Suppose that we had solved the control problem so that we were able to load any value we chose into the motivation system of a superintelligence, making it pursue that value as its final goal. Which value should we install? The choice is no light matter. If the superintelligence obtains a decisive strategic advantage, the value would determine the disposition of the cosmic endowment. Clearly, it is essential that we not make a mistake in our value selection. But how could we realistically hope to achieve errorlessness in a matter like this? We might be wrong about morality; wrong also about what is good for us; wrong even about what we truly want. Specifying a final goal, it seems, requires making one’s way through a thicket of thorny philosophical problems. If we try a direct approach, we are likely to make a hash of things. The risk of mistaken choosing is especially high when the decision context is unfamiliar—and selecting the final goal for a machine superintelligence that will shape all of humanity’s future is an extremely unfamiliar decision context if any is.
This, again, suggests the need for humility because these are deep philosophical questions about value that have perplexed leading thinkers for thousands of years. By default, the selection of values would be placed in the hands of the generation of programmers who would build what might evolve into highly advanced intelligence.
Fools might eagerly accept this challenge of solving in one swing all the
important problems in moral philosophy, in order to infix their favorite answers
into the seed AI. Wiser souls would look hard for some alternative approach, some
way to hedge.
The Common Good Principle
Bostrom suggests what he calls the “common good principle,” which is that “superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals.” While this sounds very good and is certainly worth aspiring to, it is not difficult to see why different people pursuing the principle might take it to amount to very different things. People will disagree over what counts as a “benefit” to humanity; and although there some broadly ethical ideals that may be widely approved of, these can be subject to diverse interpretations in particular cases, leading to widely diverging conceptions of what is good and right.