Computer Go is the field of artificial intelligence (AI) dedicated to creating a computer program that plays the traditional board game Go . The field is sharply divided into two eras. Before 2015, the programs of the era were weak. The best efforts of the 1980s and 1990s produced only AIs that could be defeated by beginners, and AIs of the early 2000s were intermediate level at best. Professionals could defeat these programs even given handicaps of 10+ stones in favor of the AI. Many of the algorithms such as alpha-beta minimax that performed well as AIs for checkers and chess fell apart on Go's 19x19 board, as there were too many branching possibilities to consider. Creation of a human professional quality program with the techniques and hardware of the time was out of reach. Some AI researchers speculated that the problem was unsolvable without creation of human-like AI .
115-441: The application of Monte Carlo tree search to Go algorithms provided a notable improvement in the late 2000s decade , with programs finally able to achieve a low-dan level : that of an advanced amateur. High-dan amateurs and professionals could still exploit these programs' weaknesses and win consistently, but computer performance had advanced past the intermediate (single-digit kyu ) level. The tantalizing unmet goal of defeating
230-430: A seki state of mutual life, and so forth in their representation of the state of the game. Historically, symbolic artificial intelligence techniques have been used to approach the problem of Go AI. Neural networks began to be tried as an alternative approach in the 2000s decade, as they required immense computing power that was expensive-to-impossible to reach in earlier decades. These approaches attempt to mitigate
345-688: A Go program with a 15x15 board that fit within the KIM-1 microcomputer's 1K RAM. Bruce F. Webster published an article in the magazine in November 1984 discussing a Go program he had written for the Apple Macintosh , including the MacFORTH source. Programs for Go were weak; a 1983 article estimated that they were at best equivalent to 20 kyu , the rating of a naive novice player, and often restricted themselves to smaller boards. AIs who played on
460-689: A branch of applied mathematics , is a topic relevant to computer Go. John H. Conway suggested applying surreal numbers to analysis of the endgame in Go. This idea has been further developed by Elwyn R. Berlekamp and David Wolfe in their book Mathematical Go . Go endgames have been proven to be PSPACE-hard if the absolute best move must be calculated on an arbitrary mostly filled board. Certain complicated situations such as Triple Ko, Quadruple Ko, Molasses Ko, and Moonshine Life make this problem difficult. (In practice, strong Monte Carlo algorithms can still handle normal Go endgame situations well enough, and
575-471: A cluster version of Zen running on a 26-core machine. In 2012, Zen beat Takemiya Masaki (9p) by 11 points at five stones handicap, followed by a 20-point win at four stones handicap. In 2013, Crazy Stone beat Yoshio Ishida (9p) in a 19×19 game at four stones handicap. The 2014 Codecentric Go Challenge, a best-of-five match in an even 19x19 game, was played between Crazy Stone and Franz-Jozef Dickhut (6d). No stronger player had ever before agreed to play
690-870: A competitive program. For example, GNU Go was competitive until 2008. Human novices often learn from the game records of old games played by master players. AI work in the 1990s often involved attempting to "teach" the AI human-style heuristics of Go knowledge. In 1996, Tim Klinger and David Mechner acknowledged the beginner-level strength of the best AIs and argued that "it is our belief that with better tools for representing and maintaining Go knowledge, it will be possible to develop stronger Go programs." They proposed two ways: recognizing common configurations of stones and their positions and concentrating on local battles. In 2001, one paper concluded that "Go programs are still lacking in both quality and quantity of knowledge," and that fixing this would improve Go AI performance. In theory,
805-465: A computer to play a reasonable game of Go, rather than merely a legal game – it is necessary to formalise the principles of good strategy, or to design a learning programme. The principles are more qualitative and mysterious than in chess, and depend more on judgment. So I think it will be even more difficult to programme a computer to play a reasonable game of Go than of chess. Prior to 2015, the best Go programs only managed to reach amateur dan level. On
920-424: A draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move. Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e.
1035-424: A draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move. Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e.
1150-579: A full-sized Go board in a transposition table, a hashing technique for mathematically summarizing is generally necessary. Zobrist hashing is very popular in Go programs because it has low collision rates, and can be iteratively updated at each move with just two XORs , rather than being calculated from scratch. Even using these performance-enhancing techniques, full tree searches on a full-sized board are still prohibitively slow. Searches can be sped up by using large amounts of domain specific pruning techniques, such as not considering moves where your opponent
1265-457: A generalization of the Elo rating system . The most famous example of this approach is AlphaGo, which proved far more effective than previous AIs. In its first version, it had one layer that analyzed millions of existing positions to determine likely moves to prioritize as worthy of further analysis, and another layer that tried to optimize its own winning chances using the suggested likely moves from
SECTION 10
#17327836345941380-441: A high branching factor . A disadvantage is that in certain positions, there may be moves that look superficially strong, but that actually lead to a loss via a subtle line of play. Such "trap states" require thorough analysis to be handled correctly, particularly when playing against an expert player; however, MCTS may not "see" such lines due to its policy of selective node expansion. It is believed that this may have been part of
1495-441: A high branching factor . A disadvantage is that in certain positions, there may be moves that look superficially strong, but that actually lead to a loss via a subtle line of play. Such "trap states" require thorough analysis to be handled correctly, particularly when playing against an expert player; however, MCTS may not "see" such lines due to its policy of selective node expansion. It is believed that this may have been part of
1610-450: A large database of professional games, or play many games against itself or other people or programs. These algorithms are then able to utilize this data as a means of improving their performance. Machine learning techniques can also be used in a less ambitious context to tune specific parameters of programs that rely mainly on other techniques. For example, Crazy Stone learns move generation patterns from several hundred sample games, using
1725-431: A lesser handicap. The series of Ing prizes was set to expire either 1) in the year 2000 or 2) when a program could beat a 1-dan professional at no handicap for 40,000,000 NT dollars . The last winner was Handtalk in 1997, claiming 250,000 NT dollars for winning an 11-stone handicap match against three 11–13 year old amateur 2–6 dans. At the time the prize expired in 2000, the unclaimed prize was 400,000 NT dollars for winning
1840-624: A nine-stone handicap match. Many other large regional Go tournaments ("congresses") had an attached computer Go event. The European Go Congress has sponsored a computer tournament since 1987, and the USENIX event evolved into the US/North American Computer Go Championship, held annually from 1988 to 2000 at the US Go Congress. Japan started sponsoring computer Go competitions in 1995. The FOST Cup
1955-502: A portion of the board influence the probability of moving into that area. Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall. Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero priors to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause
2070-502: A portion of the board influence the probability of moving into that area. Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall. Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero priors to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause
2185-562: A serious competition against a go program on even terms. Franz-Jozef Dickhut won, though Crazy Stone won the first match by 1.5 points. AlphaGo , developed by Google DeepMind , was a significant advance in computer strength compared to previous Go programs. It used techniques that combined deep learning and Monte Carlo tree search . In October 2015, it defeated Fan Hui , the European Go champion, five times out of five in tournament conditions. In March 2016, AlphaGo beat Lee Sedol in
2300-500: A strong Go-playing program something that could be achieved only in the far future, as a result of fundamental advances in general artificial intelligence technology. Those who thought the problem feasible believed that domain knowledge would be required to be effective against human experts. Therefore, a large part of the computer Go development effort was during these times focused on ways of representing human-like expert knowledge and combining this with local search to answer questions of
2415-401: A strong program is hard to create. The large board size prevents an alpha-beta searcher from achieving deep look-ahead without significant search extensions or pruning heuristics. In 2002, a computer program called MIGOS (MIni GO Solver) completely solved the game of Go for the 5×5 board. Black wins, taking the whole board. Continuing the comparison to chess, Go moves are not as limited by
SECTION 20
#17327836345942530-409: A tactical nature. The result of this were programs that handled many specific situations well but which had very pronounced weaknesses in their overall handling of the game. Also, these classical programs gained almost nothing from increases in available computing power. Progress in the field was generally slow. The large board (19×19, 361 intersections) is often noted as one of the primary reasons why
2645-616: A terminal leaf of the tree by repeated random playouts (similar to Monte Carlo strategies for other problems). The advantage is that such random playouts can be done very quickly. The intuitive objection - that random playouts do not correspond to the actual worth of a position - turned out not to be as fatal to the procedure as expected; the "tree search" side of the algorithm corrected well enough for finding reasonable future game trees to explore. Programs based on this method such as MoGo and Fuego saw better performance than classic AIs from earlier. The best programs could do especially well on
2760-439: A tree are direct wins for one side, and boards have a reasonable heuristic for evaluation in simple material counting, as well as certain positional factors such as pawn structure. A future where one side has lost their queen for no benefit clearly favors the other side. These types of positional evaluation rules cannot efficiently be applied to Go. The value of a Go position depends on a complex analysis to determine whether or not
2875-488: Is a heuristic search algorithm for some kinds of decision processes , most notably those employed in software that plays board games . In that context MCTS is used to solve the game tree . MCTS was combined with neural networks in 2016 and has been used in multiple board games like Chess , Shogi , Checkers , Backgammon , Contract Bridge , Go , Scrabble , and Clobber as well as in turn-based-strategy video games (such as Total War: Rome II 's implementation in
2990-488: Is already strong, and selective extensions like always considering moves next to groups of stones which are about to be captured . However, both of these options introduce a significant risk of not considering a vital move which would have changed the course of the game. Results of computer competitions show that pattern matching techniques for choosing a handful of appropriate moves combined with fast localized tactical searches (explained above) were once sufficient to produce
3105-510: Is an empirically chosen constant. Heuristics used in Monte Carlo tree search often require many parameters. There are automated methods to tune the parameters to maximize the win rate. Monte Carlo tree search can be concurrently executed by many threads or processes . There are several fundamentally different methods of its parallel execution: Monte Carlo tree search In computer science , Monte Carlo tree search ( MCTS )
3220-399: Is based on many playouts, also called roll-outs . In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts. The most basic way to use playouts is to apply the same number of playouts after each legal move of
3335-399: Is based on many playouts, also called roll-outs . In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts. The most basic way to use playouts is to apply the same number of playouts after each legal move of
3450-618: Is high for moves with few simulations. Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon Markov Decision Processes (MDPs) introduced by Chang et al. (2005) in Operations Research . (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and
3565-499: Is high for moves with few simulations. Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon Markov Decision Processes (MDPs) introduced by Chang et al. (2005) in Operations Research . (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and
Computer Go - Misplaced Pages Continue
3680-422: Is how to represent the current state of the game. The most direct way of representing a board is as a one- or two-dimensional array, where elements in the array represent points on the board, and can take on a value corresponding to a white stone, a black stone, or an empty intersection. Additional data is needed to store how many stones have been captured, whose turn it is, and which intersections are illegal due to
3795-479: Is partly because it has traditionally been difficult to create an effective evaluation function for a Go board, and partly because the large number of possible moves each side can make each leads to a high branching factor . This makes this technique very computationally expensive. Because of this, many programs which use search trees extensively can only play on the smaller 9×9 board, rather than full 19×19 ones. There are several techniques, which can greatly improve
3910-437: Is that Go playing software, which usually communicates using the standardized Go Text Protocol (GTP), will not always agree with respect to the alive or dead status of stones. Monte Carlo tree search In computer science , Monte Carlo tree search ( MCTS ) is a heuristic search algorithm for some kinds of decision processes , most notably those employed in software that plays board games . In that context MCTS
4025-400: Is to use machine learning techniques. In these, the only thing that the programmers need to program are the rules and simple scoring algorithms of how to analyze the worth of a position. The software will then automatically generates its own sense of patterns, heuristics, and strategies, in theory. This is generally done by allowing a neural network or genetic algorithm to either review
4140-512: Is to use a minimax tree search . This involves playing out all hypothetical moves on the board up to a certain point, then using an evaluation function to estimate the value of that position for the current player. The move which leads to the best hypothetical board is selected, and the process is repeated each turn. While tree searches have been very effective in computer chess , they have seen less success in Computer Go programs. This
4255-551: Is used to solve the game tree . MCTS was combined with neural networks in 2016 and has been used in multiple board games like Chess , Shogi , Checkers , Backgammon , Contract Bridge , Go , Scrabble , and Clobber as well as in turn-based-strategy video games (such as Total War: Rome II 's implementation in the high level campaign AI ) and applications outside of games. The Monte Carlo method , which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to
4370-508: The Internet Go Server (IGS) on 19x19 size boards had around 20–15 kyu strength in 2003, after substantial improvements in hardware. In 1998, very strong players were able to beat computer programs while giving handicaps of 25–30 stones, an enormous handicap that few human players would ever take. There was a case in the 1994 World Computer Go Championship where the winning program, Go Intellect, lost all three games against
4485-519: The Ko rule . In general, machine learning programs stop there at this simplest form and let the organic AIs come to their own understanding of the meaning of the board, likely simply using Monte Carlo playouts to "score" a board as good or bad for a player. "Classic" AI programs that attempted to directly model a human's strategy might go further, however, such as layering on data such as stones believed to be dead, stones that are unconditionally alive, stones in
4600-626: The exploration of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT ( Upper Confidence Bound 1 applied to trees ), was introduced by Levente Kocsis and Csaba Szepesvári . UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer and the probably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision-making models (specifically, Markov Decision Processes ) by Chang, Fu, Hu, and Marcus. Kocsis and Szepesvári recommend to choose in each node of
4715-575: The exploration of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT ( Upper Confidence Bound 1 applied to trees ), was introduced by Levente Kocsis and Csaba Szepesvári . UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer and the probably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision-making models (specifically, Markov Decision Processes ) by Chang, Fu, Hu, and Marcus. Kocsis and Szepesvári recommend to choose in each node of
Computer Go - Misplaced Pages Continue
4830-605: The subset sum problem . Several annual competitions take place between Go computer programs, including Go events at the Computer Olympiad . Regular, less formal, competitions between programs used to occur on the KGS Go Server (monthly) and the Computer Go Server (continuous). Many programs are available that allow computer Go engines to play against each other; they almost always communicate via
4945-553: The 1940s. In his 1987 PhD thesis, Bruce Abramson combined minimax search with an expected-outcome model based on random game playouts to the end, instead of the usual static evaluation function . Abramson said the expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent." He experimented in-depth with tic-tac-toe and then with machine-generated evaluation functions for Othello and chess . Such methods were then explored and successfully applied to heuristic search in
5060-406: The 2001 survey put it, "just one bad move can ruin a good game. Program performance over a full game can be much lower than master level." One major alternative to using hand-coded knowledge and searches is the use of Monte Carlo methods . This is done by generating a list of potential moves, and for each move playing out thousands of games at random on the resulting board. The move which leads to
5175-437: The 2007 Computer Olympiad and won one (out of three) blitz game against Guo Juan, 5th Dan Pro, in the much less complex 9x9 Go. The Many Faces of Go won the 2008 Computer Olympiad after adding UCT search to its traditional knowledge-based engine. Monte-Carlo based Go engines have a reputation of being much more willing to play tenuki , moves elsewhere on the board, rather than continue a local fight than human players. This
5290-462: The Amazons , and Arimaa ), real-time video games (for instance Ms. Pac-Man and Fable Legends ), and nondeterministic games (such as skat , poker , Magic: The Gathering , or Settlers of Catan ). The focus of MCTS is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games
5405-407: The Amazons , and Arimaa ), real-time video games (for instance Ms. Pac-Man and Fable Legends ), and nondeterministic games (such as skat , poker , Magic: The Gathering , or Settlers of Catan ). The focus of MCTS is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games
5520-466: The Go Text Protocol (GTP). The first computer Go competition was sponsored by Acornsoft , and the first regular ones by USENIX . They ran from 1984 to 1988. These competitions introduced Nemesis, the first competitive Go program from Bruce Wilcox , and G2.5 by David Fotland, which would later evolve into Cosmos and The Many Faces of Go. One of the early drivers of computer Go research was
5635-538: The Ing Prize, a relatively large money award sponsored by Taiwanese banker Ing Chang-ki , offered annually between 1985 and 2000 at the World Computer Go Congress (or Ing Cup). The winner of this tournament was allowed to challenge young players at a handicap in a short match. If the computer won the match, the prize was awarded and a new prize announced: a larger prize for beating the players at
5750-648: The UCT (Upper Confidence bounds applied to Trees) algorithm, and S. Gelly et al. implemented UCT in their program MoGo. In 2008, MoGo achieved dan (master) level in 9×9 Go, and the Fuego program began to win against strong amateur players in 9×9 Go. In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an amateur 2 dan player. Google Deepmind developed the program AlphaGo , which in October 2015 became
5865-443: The UCT (Upper Confidence bounds applied to Trees) algorithm, and S. Gelly et al. implemented UCT in their program MoGo. In 2008, MoGo achieved dan (master) level in 9×9 Go, and the Fuego program began to win against strong amateur players in 9×9 Go. In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an amateur 2 dan player. Google Deepmind developed the program AlphaGo , which in October 2015 became
SECTION 50
#17327836345945980-427: The UCT algorithm described below. If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible,
6095-427: The UCT algorithm described below. If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible,
6210-445: The best human players without a handicap, long thought unreachable, brought a burst of renewed interest. The key insight proved to be an application of machine learning and deep learning . DeepMind , a Google acquisition dedicated to AI research, produced AlphaGo in 2015 and announced it to the world in 2016. AlphaGo defeated Lee Sedol , a 9 dan professional, in a no-handicap match in 2016, then defeated Ke Jie in 2017 , who at
6325-527: The best set of random games for the current player is chosen as the best move. No potentially fallible knowledge-based system is required. However, because the moves used for evaluation are generated at random it is possible that a move which would be excellent except for one specific opponent response would be mistakenly evaluated as a good move. The result of this are programs which are strong in an overall strategic sense, but are imperfect tactically. This problem can be mitigated by adding some domain knowledge in
6440-855: The contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later. When using RAVE, the selection step selects the node, for which the modified UCB1 formula ( 1 − β ( n i , n ~ i ) ) w i n i + β ( n i , n ~ i ) w ~ i n ~ i + c ln t n i {\displaystyle (1-\beta (n_{i},{\tilde {n}}_{i})){\frac {w_{i}}{n_{i}}}+\beta (n_{i},{\tilde {n}}_{i}){\frac {{\tilde {w}}_{i}}{{\tilde {n}}_{i}}}+c{\sqrt {\frac {\ln t}{n_{i}}}}} has
6555-855: The contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later. When using RAVE, the selection step selects the node, for which the modified UCB1 formula ( 1 − β ( n i , n ~ i ) ) w i n i + β ( n i , n ~ i ) w ~ i n ~ i + c ln t n i {\displaystyle (1-\beta (n_{i},{\tilde {n}}_{i})){\frac {w_{i}}{n_{i}}}+\beta (n_{i},{\tilde {n}}_{i}){\frac {{\tilde {w}}_{i}}{{\tilde {n}}_{i}}}+c{\sqrt {\frac {\ln t}{n_{i}}}}} has
6670-446: The current player, then choose the move which led to the most victories. The efficiency of this method—called Pure Monte Carlo Game Search —often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps: This graph shows the steps involved in one decision, with each node showing
6785-446: The current player, then choose the move which led to the most victories. The efficiency of this method—called Pure Monte Carlo Game Search —often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps: This graph shows the steps involved in one decision, with each node showing
6900-444: The expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent." He experimented in-depth with tic-tac-toe and then with machine-generated evaluation functions for Othello and chess . Such methods were then explored and successfully applied to heuristic search in the field of automated theorem proving by W. Ertel, J. Schumann and C. Suttner in 1989, thus improving
7015-399: The exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or iterative deepening . In 1992, B. Brügmann employed it for the first time in a Go-playing program . In 2002, Chang et al. proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for
SECTION 60
#17327836345947130-507: The field of automated theorem proving by W. Ertel, J. Schumann and C. Suttner in 1989, thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or iterative deepening . In 1992, B. Brügmann employed it for the first time in a Go-playing program . In 2002, Chang et al. proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for
7245-719: The first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19x19 board. In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating Lee Sedol in a five-game match with a final score of four games to one. AlphaGo represents a significant improvement over previous Go programs as well as a milestone in machine learning as it uses Monte Carlo tree search with artificial neural networks (a deep learning method) for policy (move selection) and value, giving it efficiency far surpassing previous programs. MCTS algorithm has also been used in programs that play other board games (for example Hex , Havannah , Game of
7360-719: The first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19x19 board. In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating Lee Sedol in a five-game match with a final score of four games to one. AlphaGo represents a significant improvement over previous Go programs as well as a milestone in machine learning as it uses Monte Carlo tree search with artificial neural networks (a deep learning method) for policy (move selection) and value, giving it efficiency far surpassing previous programs. MCTS algorithm has also been used in programs that play other board games (for example Hex , Havannah , Game of
7475-477: The first layer. AlphaGo used Monte Carlo tree search to score the resulting positions. A later version of AlphaGo, AlphaGoZero, eschewed learning from existing Go games, and instead learnt only from playing itself repeatedly. Other earlier programs using neural nets include NeuroGo and WinHonte. Computer Go research results are being applied to other similar fields such as cognitive science , pattern recognition and machine learning . Combinatorial Game Theory ,
7590-437: The first three of five matches. This was the first time that a 9-dan master had played a professional game against a computer without handicap. Lee won the fourth match, describing his win as "invaluable". AlphaGo won the final match two days later. With this victory, AlphaGo became the first program to beat a 9 dan human professional in a game without handicaps on a full-sized board. In May 2017, AlphaGo beat Ke Jie , who at
7705-455: The game EinStein würfelt nicht! . It converges to optimal play (as k tends to infinity) in board filling games with random turn order, for instance in the game Hex with random turn order. DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network. The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and
7820-455: The game EinStein würfelt nicht! . It converges to optimal play (as k tends to infinity) in board filling games with random turn order, for instance in the game Hex with random turn order. DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network. The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and
7935-400: The game as requiring intuition, creative and strategic thinking. It has long been considered a difficult challenge in the field of artificial intelligence (AI) and is considerably more difficult to solve than chess . Many in the field considered Go to require more elements that mimic human thought than chess. Mathematician I. J. Good wrote in 1965: Go on a computer? – In order to programme
8050-484: The game tree the move for which the expression w i n i + c ln N i n i {\displaystyle {\frac {w_{i}}{n_{i}}}+c{\sqrt {\frac {\ln N_{i}}{n_{i}}}}} has the highest value. In this formula: The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it
8165-484: The game tree the move for which the expression w i n i + c ln N i n i {\displaystyle {\frac {w_{i}}{n_{i}}}+c{\sqrt {\frac {\ln N_{i}}{n_{i}}}}} has the highest value. In this formula: The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it
8280-462: The game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in general game playing . The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus , it achieves better results than classical algorithms in games with
8395-462: The game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in general game playing . The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus , it achieves better results than classical algorithms in games with
8510-429: The game, such as joseki. Some examples of programs which have relied heavily on expert knowledge are Handtalk (later known as Goemate), The Many Faces of Go, Go Intellect, and Go++, each of which has at some point been considered the world's best Go program. However, these methods ultimately had diminishing returns, and never really advanced past an intermediate level at best on a full-sized board. One particular problem
8625-494: The group is alive, which stones can be connected to one another, and heuristics around the extent to which a strong position has influence, or the extent to which a weak position can be attacked. A stone placed might not have immediate influence, but after many moves could become highly important in retrospect as other areas of the board take shape. Poor evaluation of board states will cause the AI to work toward positions it incorrectly believes favor it, but actually do not. One of
8740-444: The high level campaign AI ) and applications outside of games. The Monte Carlo method , which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. In his 1987 PhD thesis, Bruce Abramson combined minimax search with an expected-outcome model based on random game playouts to the end, instead of the usual static evaluation function . Abramson said
8855-495: The highest denominator) is chosen as the final answer. This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: k random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by fair coin flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in
8970-495: The highest denominator) is chosen as the final answer. This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: k random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by fair coin flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in
9085-1450: The highest value. In this formula, w ~ i {\displaystyle {\tilde {w}}_{i}} and n ~ i {\displaystyle {\tilde {n}}_{i}} stand for the number of won playouts containing move i and the number of all playouts containing move i , and the β ( n i , n ~ i ) {\displaystyle \beta (n_{i},{\tilde {n}}_{i})} function should be close to one and to zero for relatively small and relatively big n i and n ~ i {\displaystyle {\tilde {n}}_{i}} , respectively. One of many formulas for β ( n i , n ~ i ) {\displaystyle \beta (n_{i},{\tilde {n}}_{i})} , proposed by D. Silver, says that in balanced positions one can take β ( n i , n ~ i ) = n ~ i n i + n ~ i + 4 b 2 n i n ~ i {\displaystyle \beta (n_{i},{\tilde {n}}_{i})={\frac {{\tilde {n}}_{i}}{n_{i}+{\tilde {n}}_{i}+4b^{2}n_{i}{\tilde {n}}_{i}}}} , where b
9200-1450: The highest value. In this formula, w ~ i {\displaystyle {\tilde {w}}_{i}} and n ~ i {\displaystyle {\tilde {n}}_{i}} stand for the number of won playouts containing move i and the number of all playouts containing move i , and the β ( n i , n ~ i ) {\displaystyle \beta (n_{i},{\tilde {n}}_{i})} function should be close to one and to zero for relatively small and relatively big n i and n ~ i {\displaystyle {\tilde {n}}_{i}} , respectively. One of many formulas for β ( n i , n ~ i ) {\displaystyle \beta (n_{i},{\tilde {n}}_{i})} , proposed by D. Silver, says that in balanced positions one can take β ( n i , n ~ i ) = n ~ i n i + n ~ i + 4 b 2 n i n ~ i {\displaystyle \beta (n_{i},{\tilde {n}}_{i})={\frac {{\tilde {n}}_{i}}{n_{i}+{\tilde {n}}_{i}+4b^{2}n_{i}{\tilde {n}}_{i}}}} , where b
9315-414: The main concerns for a Go player is which groups of stones can be kept alive and which can be captured. This general class of problems is known as life and death . Knowledge-based AI systems sometimes attempted to understand the life and death status of groups on the board. The most direct approach is to perform a tree search on the moves which potentially affect the stones in question, and then to record
9430-454: The model of Markov decision processes . AMS was the first work to explore the idea of UCB -based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT (Upper Confidence Trees). In 2006, inspired by these predecessors, Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, L. Kocsis and Cs. Szepesvári developed
9545-454: The model of Markov decision processes . AMS was the first work to explore the idea of UCB -based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT (Upper Confidence Trees). In 2006, inspired by these predecessors, Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, L. Kocsis and Cs. Szepesvári developed
9660-608: The most complicated classes of life-and-death endgame problems are unlikely to come up in a high-level game.) Various difficult combinatorial problems (any NP-hard problem) can be converted to Go-like problems on a sufficiently large board; however, the same is true for other abstract board games, including chess and minesweeper , when suitably generalized to a board of arbitrary size. NP-complete problems do not tend in their general case to be easier for unaided humans than for suitably programmed computers: unaided humans are much worse than computers at solving, for example, instances of
9775-416: The most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE ( Rapid Action Value Estimation ). In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games
9890-416: The most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE ( Rapid Action Value Estimation ). In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games
10005-402: The move generation and a greater level of search depth on top of the random evolution. Some programs which use Monte-Carlo techniques are Fuego, The Many Faces of Go v12, Leela, MoGo, Crazy Stone , MyGoFriend, and Zen. In 2006, a new search technique, upper confidence bounds applied to trees (UCT), was developed and applied to many 9x9 Monte-Carlo Go programs with excellent results. UCT uses
10120-421: The node to be chosen more or less frequently, respectively, in the selection step. A related method, called progressive bias , consists in adding to the UCB1 formula a b i n i {\displaystyle {\frac {b_{i}}{n_{i}}}} element, where b i is a heuristic score of the i -th move. The basic Monte Carlo tree search collects enough information to find
10235-421: The node to be chosen more or less frequently, respectively, in the selection step. A related method, called progressive bias , consists in adding to the UCB1 formula a b i n i {\displaystyle {\frac {b_{i}}{n_{i}}}} element, where b i is a heuristic score of the i -th move. The basic Monte Carlo tree search collects enough information to find
10350-492: The performance of search trees in terms of both speed and memory. Pruning techniques such as alpha–beta pruning , Principal Variation Search , and MTD(f) can reduce the effective branching factor without loss of strength. In tactical areas such as life and death, Go is particularly amenable to caching techniques such as transposition tables . These can reduce the amount of repeated effort, especially when combined with an iterative deepening approach. In order to quickly store
10465-553: The problems of the game of Go having a high branching factor and numerous other difficulties. The only choice a program needs to make is where to place its next stone. However, this decision is made difficult by the wide range of impacts a single stone can have across the entire board, and the complex interactions various stones' groups can have with each other. Various architectures have arisen for handling this problem. Popular techniques and design philosophies include: One traditional AI technique for creating game playing software
10580-485: The ratio of wins to total playouts from that point in the game tree for the player that the node represents. In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move. Note that this graph does not follow
10695-428: The ratio of wins to total playouts from that point in the game tree for the player that the node represents. In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move. Note that this graph does not follow
10810-432: The reason for AlphaGo's loss in its fourth game against Lee Sedol . In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar". Various modifications of the basic Monte Carlo tree search method have been proposed to shorten
10925-432: The reason for AlphaGo's loss in its fourth game against Lee Sedol . In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar". Various modifications of the basic Monte Carlo tree search method have been proposed to shorten
11040-422: The results of the play outs collected so far to guide the search along the more successful lines of play, while still allowing alternative lines to be explored. The UCT technique along with many other optimizations for playing on the larger 19x19 board has led MoGo to become one of the strongest research programs. Successful early applications of UCT methods to 19x19 Go include MoGo, Crazy Stone, and Mango. MoGo won
11155-433: The rules of the game. For the first move in chess, the player has twenty choices. Go players begin with a choice of 55 distinct legal moves, accounting for symmetry. This number rises quickly as symmetry is broken, and soon almost all of the 361 points of the board must be evaluated. One of the most basic tasks in a game is to assess a board position: which side is favored, and by how much? In chess, many future positions in
11270-535: The search time. Some employ domain-specific expert knowledge, others do not. Monte Carlo tree search can use either light or heavy playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves. These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic ) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in
11385-470: The search time. Some employ domain-specific expert knowledge, others do not. Monte Carlo tree search can use either light or heavy playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves. These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic ) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in
11500-654: The small 9x9 board, which had fewer possibilities to explore. In 2009, the first such programs appeared which could reach and hold low dan-level ranks on the KGS Go Server on the 19x19 board. In 2010, at the 2010 European Go Congress in Finland, MogoTW played 19x19 Go against Catalin Taranu (5p). MogoTW received a seven-stone handicap and won. In 2011, Zen reached 5 dan on the server KGS, playing games of 15 seconds per move. The account which reached that rank uses
11615-591: The small 9×9 board, the computer fared better, and some programs managed to win a fraction of their 9×9 games against professional players. Prior to AlphaGo, some researchers had claimed that computers would never defeat top humans at Go. The first Go program was written by Albert Lindsey Zobrist in 1968 as part of his thesis on pattern recognition . It introduced an influence function to estimate territory and Zobrist hashing to detect ko . In April 1981, Jonathan K Millen published an article in Byte discussing Wally,
11730-489: The status of the stones at the end of the main line of play. However, within time and memory constraints, it is not generally possible to determine with complete accuracy which moves could affect the 'life' of a group of stones. This implies that some heuristic must be applied to select which moves to consider. The net effect is that for any given program, there is a trade-off between playing speed and life and death reading abilities. An issue that all Go programs must tackle
11845-455: The system has ways to determine which heuristic is more important and applicable to the situation. Most of the relatively successful results come from programmers' individual skills at Go and their personal conjectures about Go, but not from formal mathematical assertions; they are trying to make the computer mimic the way they play Go. Competitive programs around 2001 could contain 50–100 modules that dealt with different aspects and strategies of
11960-420: The techniques used to build AlphaGo, which proved so much stronger than everything else. By 2017, both Zen and Tencent 's project Fine Art were capable of defeating very high-level professionals some of the time. The open source Leela Zero engine was created as well. For a long time, it was a widely held opinion that computer Go posed a problem fundamentally different from computer chess . Many considered
12075-434: The time continuously held the world No. 1 ranking for two years. Just as checkers had fallen to machines in 1995 and chess in 1997 , computer programs finally conquered humanity's greatest Go champions in 2016–2017. DeepMind did not release AlphaGo for public use, but various programs have been built since based on the journal articles DeepMind released describing AlphaGo and its variants. Professional Go players see
12190-582: The time was ranked top in the world, in a three-game match during the Future of Go Summit . In October 2017, DeepMind revealed a new version of AlphaGo, trained only through self play, that had surpassed all previous versions, beating the Ke Jie version in 89 out of 100 games. After the basic principles of AlphaGo were published in the journal Nature , other teams have been able to produce high-level programs. Work on Go AI since has largely consisted of emulating
12305-478: The use of expert knowledge would improve Go software. Hundreds of guidelines and rules of thumb for strong play have been formulated by both high-level amateurs and professionals. The programmer's task is to take these heuristics , formalize them into computer code, and utilize pattern matching and pattern recognition algorithms to recognize when these rules apply. It is also important to be able to "score" these heuristics so that when they offer conflicting advice,
12420-417: The value of each move is often only slightly influenced by other moves. In RAVE, for a given game tree node N , its child nodes C i store not only the statistics of wins in playouts started in node N but also the statistics of wins in all playouts started in node N and below it, if they contain move i (also when the move was played in the tree, between node N and a playout). This way
12535-417: The value of each move is often only slightly influenced by other moves. In RAVE, for a given game tree node N , its child nodes C i store not only the statistics of wins in playouts started in node N but also the statistics of wins in all playouts started in node N and below it, if they contain move i (also when the move was played in the tree, between node N and a playout). This way
12650-407: The youth players while receiving a 15-stone handicap. In general, players who understood and exploited a program's weaknesses could win even through large handicaps. In 2006 (with an article published in 2007), Rémi Coulom produced a new algorithm he called Monte Carlo tree search . In it, a game tree is created as usual of potential futures that branch with every move. However, computers "score"
12765-647: Was held annually from 1995 to 1999 in Tokyo. That tournament was supplanted by the Gifu Challenge, which was held annually from 2003 to 2006 in Ogaki, Gifu. The Computer Go UEC Cup has been held annually since 2007. When two computers play a game of Go against each other, the ideal is to treat the game in a manner identical to two humans playing while avoiding any intervention from actual humans. However, this can be difficult during end game scoring. The main problem
12880-480: Was often perceived as a weakness early in these program's existence. That said, this tendency has persisted in AlphaGo's playstyle with dominant results, so this may be more of a "quirk" than a "weakness." The skill level of knowledge-based systems is closely linked to the knowledge of their programmers and associated domain experts. This limitation has made it difficult to program truly strong AIs. A different path
12995-434: Was overall game strategy. Even if an expert system recognizes a pattern and knows how to play a local skirmish, it may miss a looming deeper strategic problem in the future. The result is a program whose strength is less than the sum of its parts; while moves may be good on an individual tactical basis, the program can be tricked and maneuvered into ceding too much in exchange, and find itself in an overall losing position. As
13110-521: Was the main seed for UCT. ) Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to minimax when using UCT, the basic version of Monte Carlo tree search converges only in so called "Monte Carlo Perfect" games. However, Monte Carlo tree search does offer significant advantages over alpha–beta pruning and similar algorithms that minimize the search space. In particular, pure Monte Carlo tree search does not need an explicit evaluation function . Simply implementing
13225-521: Was the main seed for UCT. ) Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to minimax when using UCT, the basic version of Monte Carlo tree search converges only in so called "Monte Carlo Perfect" games. However, Monte Carlo tree search does offer significant advantages over alpha–beta pruning and similar algorithms that minimize the search space. In particular, pure Monte Carlo tree search does not need an explicit evaluation function . Simply implementing
#593406