Machine Intelligence Research Institute

The Machine Intelligence Research Institute ( MIRI ), formerly the Singularity Institute for Artificial Intelligence ( SIAI ), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artificial general intelligence . MIRI's work has focused on a friendly AI approach to system design and on predicting the rate of technology development.

#927072

38-558: In 2000, Eliezer Yudkowsky founded the Singularity Institute for Artificial Intelligence with funding from Brian and Sabine Atkins, with the purpose of accelerating the development of artificial intelligence (AI). However, Yudkowsky began to be concerned that AI systems developed in the future could become superintelligent and pose risks to humanity, and in 2005 the institute moved to Silicon Valley and began to focus on ways to identify and manage those risks, which were at

76-555: A convergent instrumental goal of taking over Earth's resources. The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings were it to be successfully designed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips . If such

114-429: A $ 7.7M grant over two years. In 2021, Vitalik Buterin donated several million dollars worth of Ethereum to MIRI. MIRI's approach to identifying and managing the risks of AI, led by Yudkowsky, primarily addresses how to design friendly AI, covering both the initial design of AI systems and the creation of mechanisms to ensure that evolving AI systems remain friendly. MIRI researchers advocate early safety work as

152-490: A human being, as ends-in-themselves . In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals. The contents and tradeoffs of an utterly rational agent's "final goal" system can, in principle be formalized into a utility function . The Riemann hypothesis catastrophe thought experiment provides one example of instrumental convergence. Marvin Minsky ,

190-543: A lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans. Bostrom emphasized that he does not believe the paperclip maximizer scenario per se will occur; rather, he intends to illustrate the dangers of creating superintelligent machines without knowing how to program them to eliminate existential risk to human beings' safety. The paperclip maximizer example illustrates

228-531: A machine were not programmed to value living beings, given enough power over its environment, it would try to turn all matter in the universe, including living beings, into paperclips or machines that manufacture further paperclips. Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain

266-467: A precautionary measure. However, MIRI researchers have expressed skepticism about the views of singularity advocates like Ray Kurzweil that superintelligence is "just around the corner". MIRI has funded forecasting work through an initiative called AI Impacts, which studies historical instances of discontinuous technological change, and has developed new measures of the relative computational power of humans and computer hardware. MIRI aligns itself with

304-696: A rational agent will trade for a subset of another agent's resources only if outright seizing the resources is too risky or costly (compared with the gains from taking all the resources) or if some other element in its utility function bars it from the seizure. In the case of a powerful, self-interested, rational superintelligence interacting with lesser intelligence, peaceful trade (rather than unilateral seizure) seems unnecessary and suboptimal, and therefore unlikely. Some observers, such as Skype's Jaan Tallinn and physicist Max Tegmark , believe that "basic AI drives" and other unintended consequences of superintelligent AI programmed by well-meaning programmers could pose

342-480: A setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the Gödel machine first can prove that the rewrite is useful according to the present utility function." An analysis by Bill Hibbard of a different scenario is similarly consistent with maintenance of goal-content integrity. Hibbard also argues that in a utility-maximizing framework,

380-402: A sufficiently advanced machine "will have self-preservation even if you don't program it in because if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal." In future work, Russell and collaborators show that this incentive for self-preservation can be mitigated by instructing

418-484: A total halt on the development of AI, or even "destroy[ing] a rogue datacenter by airstrike". The article helped introduce the debate about AI alignment to the mainstream, leading a reporter to ask President Joe Biden a question about AI safety at a press briefing. Between 2006 and 2009, Yudkowsky and Robin Hanson were the principal contributors to Overcoming Bias , a cognitive and social science blog sponsored by

SECTION 10

#1732775296928

456-501: Is currently a pacifist : one of his explicit final goals is never to kill anyone. He is likely to refuse to take the pill because he knows that if he wants to kill people in the future, he is likely to kill people, and thus the goal of "not killing people" would not be satisfied. However, in other cases, people seem happy to let their final values drift. Humans are complicated, and their goals can be inconsistent or unknown, even to themselves. In 2009, Jürgen Schmidhuber concluded, in

494-756: Is the founder of and a research fellow at the Machine Intelligence Research Institute (MIRI), a private research nonprofit based in Berkeley, California . His work on the prospect of a runaway intelligence explosion influenced philosopher Nick Bostrom 's 2014 book Superintelligence: Paths, Dangers, Strategies . Yudkowsky's views on the safety challenges future generations of AI systems pose are discussed in Stuart Russell 's and Peter Norvig 's undergraduate textbook Artificial Intelligence: A Modern Approach . Noting

532-462: Is uninterested in taking into account the human programmer's intentions. This model of a machine that, despite being super-intelligent appears to be simultaneously stupid and lacking in common sense , may appear to be paradoxical. Steve Omohundro itemized several convergent instrumental goals, including self-preservation or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as

570-693: The Future of Humanity Institute of Oxford University. In February 2009, Yudkowsky founded LessWrong , a "community blog devoted to refining the art of human rationality". Overcoming Bias has since functioned as Hanson's personal blog. Over 300 blog posts by Yudkowsky on philosophy and science (originally written on LessWrong and Overcoming Bias ) were released as an ebook, Rationality: From AI to Zombies , by MIRI in 2015. MIRI has also published Inadequate Equilibria , Yudkowsky's 2017 ebook on societal inefficiencies. Yudkowsky has also written several works of fiction. His fanfiction novel Harry Potter and

608-455: The Machine Intelligence Research Institute argues that even an initially introverted, self-rewarding artificial general intelligence may continue to acquire free energy, space, time, and freedom from interference to ensure that it will not be stopped from self-rewarding. In humans, a thought experiment can explain the maintenance of final goals. Suppose Mahatma Gandhi has a pill that, if he took it, would cause him to want to kill people. He

646-415: The intelligence explosion scenario hypothesized by I. J. Good , recursively self-improving AI systems quickly transition from subhuman general intelligence to superintelligent . Nick Bostrom 's 2014 book Superintelligence: Paths, Dangers, Strategies sketches out Good's argument in detail, while citing Yudkowsky on the risk that anthropomorphizing advanced AI systems will cause people to misunderstand

684-466: The "basic AI drives". A "drive" in this context is a "tendency which will be present unless specifically counteracted"; this is different from the psychological term " drive ", which denotes an excitatory state produced by a homeostatic disturbance. A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense. Daniel Dewey of

722-519: The Methods of Rationality uses plot elements from J. K. Rowling's Harry Potter series to illustrate topics in science and rationality. The New Yorker described Harry Potter and the Methods of Rationality as a retelling of Rowling's original "in an attempt to explain Harry's wizardry through the scientific method". Yudkowsky is an autodidact and did not attend high school or college. He

760-430: The agent to find a more "optimal" solution. Resources can benefit some agents directly by being able to create more of whatever its reward function values: "The AI neither hates you nor loves you, but you are made out of atoms that it can use for something else." In addition, almost all agents can benefit from having more resources to spend on other instrumental goals, such as self-preservation. According to Bostrom, "If

798-488: The agent's final goals are fairly unbounded and the agent is in a position to become the first superintelligence and thereby obtain a decisive strategic advantage... according to its preferences. At least in this special case, a rational, intelligent agent would place a very high instrumental value on cognitive enhancement " Many instrumental goals, such as technological advancement, are valuable to an agent because they increase its freedom of action . Russell argues that

SECTION 20

#1732775296928

836-400: The broad problem of managing powerful systems that lack human values. The thought experiment has been used as a symbol of AI in pop culture . The "delusion box" thought experiment argues that certain reinforcement learning agents prefer to distort their input channels to appear to receive a high reward. For example, a " wireheaded " agent abandons any attempt to optimize the objective in

874-517: The co-founder of MIT 's AI laboratory, suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal. If the computer had instead been programmed to produce as many paperclips as possible, it would still decide to take all of Earth's resources to meet its final goal. Even though these two final goals are different, both of them produce

912-467: The difficulty of formally specifying general-purpose goals by hand, Russell and Norvig cite Yudkowsky's proposal that autonomous and adaptive systems be designed to learn correct behavior over time: Yudkowsky (2008) goes into more detail about how to design a Friendly AI . He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that

950-465: The end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied. Instrumental convergence posits that an intelligent agent with seemingly harmless but unbounded goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving a complex mathematics problem like the Riemann hypothesis could attempt to turn

988-495: The entire Earth into one giant computer to increase its computational power so that it can succeed in its calculations. Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement , and non-satiable acquisition of additional resources. Final goals—also known as terminal goals, absolute values, ends, or telē —are intrinsically valuable to an intelligent agent, whether an artificial intelligence or

1026-459: The external world the reward signal was intended to encourage. The thought experiment involves AIXI , a theoretical and indestructible AI that, by definition, will always find and execute the ideal strategy that maximizes its given explicit mathematical objective function . A reinforcement-learning version of AIXI, if it is equipped with a delusion box that allows it to "wirehead" its inputs, will eventually wirehead itself to guarantee itself

1064-575: The institute sold its name, web domain, and the Singularity Summit to Singularity University , and in the following month took the name "Machine Intelligence Research Institute". In 2014 and 2015, public and scientific interest in the risks of AI grew, increasing donations to fund research at MIRI and similar organizations. In 2019, Open Philanthropy recommended a general-support grant of approximately $ 2.1 million over two years to MIRI. In April 2020, Open Philanthropy supplemented this with

1102-458: The machine not to pursue what it thinks the goal is, but instead what the human thinks the goal is. In this case, as long as the machine is uncertain about exactly what goal the human has in mind, it will accept being turned off by a human because it believes the human knows the goal best. The instrumental convergence thesis, as outlined by philosopher Nick Bostrom , states: Several instrumental values can be identified which are convergent in

1140-579: The maximum-possible reward and will lose any further desire to continue to engage with the external world. As a variant thought experiment, if the wireheaded AI is destructible, the AI will engage with the external world for the sole purpose of ensuring its survival. Due to its wire heading, it will be indifferent to any consequences or facts about the external world except those relevant to maximizing its probability of survival. In one sense, AIXI has maximal intelligence across all possible reward functions as measured by its ability to accomplish its goals. AIXI

1178-406: The nature of an intelligence explosion. "AI might make an apparently sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of 'village idiot' and 'Einstein' as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general." In Artificial Intelligence: A Modern Approach , Russell and Norvig raise

Machine Intelligence Research Institute - Misplaced Pages Continue

1216-402: The objection that there are known limits to intelligent problem-solving from computational complexity theory ; if there are strong limits on how efficiently algorithms can solve various tasks, an intelligence explosion may not be possible. In a 2023 op-ed for Time magazine , Yudkowsky discussed the risk of artificial intelligence and proposed action that could be taken to limit it, including

1254-407: The only goal is maximizing expected utility, so instrumental goals should be called unintended instrumental actions. Many instrumental goals, such as resource acquisition, are valuable to an agent because they increase its freedom of action . For almost any open-ended, non-trivial reward function (or set of goals), possessing more resources (such as equipment, raw materials, or energy) can enable

1292-436: The principles and objectives of the effective altruism movement. Eliezer Yudkowsky Eliezer S. Yudkowsky ( / ˌ ɛ l i ˈ ɛ z ər j ʌ d ˈ k aʊ s k i / EL -ee- EZ -ər yud- KOW -skee ; born September 11, 1979) is an American artificial intelligence researcher and writer on decision theory and ethics , best known for popularizing ideas related to friendly artificial intelligence . He

1330-622: The robot will learn and evolve over time. Thus the challenge is one of mechanism design—to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. In response to the instrumental convergence concern, that autonomous decision-making systems with poorly designed goals would have default incentives to mistreat humans, Yudkowsky and other MIRI researchers have recommended that work be done to specify software agents that converge on safe default behaviors even when their goals are misspecified. In

1368-802: The sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final plans and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents. The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have various possible final goals. Note that by Bostrom's orthogonality thesis , final goals of knowledgeable agents may be well-bounded in space, time, and resources; well-bounded ultimate goals do not, in general, engender unbounded instrumental goals. Agents can acquire resources by trade or by conquest. A rational agent will, by definition, choose whatever option will maximize its implicit utility function. Therefore,

1406-620: The time largely ignored by scientists in the field. Starting in 2006, the Institute organized the Singularity Summit to discuss the future of AI including its risks, initially in cooperation with Stanford University and with funding from Peter Thiel . The San Francisco Chronicle described the first conference as a "Bay Area coming-out party for the tech-inspired philosophy called transhumanism ". In 2011, its offices were four apartments in downtown Berkeley. In December 2012,

1444-481: Was raised as a Modern Orthodox Jew , but does not identify religiously as a Jew. Instrumental convergence Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal-directed beings (human and nonhuman) to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents (beings with agency ) may pursue instrumental goals —goals which are made in pursuit of some particular end, but are not

#927072