AlphaGo - Machine Learning Games

- 12.36

Machine Learning: Principles and techniques (Chapter 1: The Logic ...
photo src: craigrichardsonblog.wordpress.com

AlphaGo is a narrow AI, computer program developed by Alphabet Inc.'s Google DeepMind in London to play the board game Go. In October 2015, it became the first Computer Go program to beat a human professional Go player without handicaps on a full-sized 19×19 board. In March 2016, it beat Lee Sedol in a five-game match, the first time a computer Go program has beaten a 9-dan professional without handicaps. Although it lost to Lee Sedol in the fourth game, Lee resigned the final game, giving a final score of 4 games to 1 in favour of AlphaGo. In recognition of beating Lee Sedol, AlphaGo was awarded an honorary 9-dan by the Korea Baduk Association. It was chosen by Science as one of the Breakthrough of the Year runners-up on 22 December 2016.

AlphaGo's algorithm uses a Monte Carlo tree search to find its moves based on knowledge previously "learned" by machine learning, specifically by an artificial neural network (a deep learning method) by extensive training, both from human and computer play.


How Forza 5 and the Xbox One use the cloud to drive machine ...
photo src: arstechnica.com


Maps, Directions, and Place Reviews



History and competitions

Go is considered much more difficult for computers to win than other games such as chess, because its much larger branching factor makes it prohibitively difficult to use traditional AI methods such as alpha-beta pruning, tree traversal and heuristic search.

Almost two decades after IBM's computer Deep Blue beat world chess champion Garry Kasparov in the 1997 match, the strongest Go programs using artificial intelligence techniques only reached about amateur 5-dan level, and still could not beat a professional Go player without handicaps. In 2012, the software program Zen, running on a four PC cluster, beat Masaki Takemiya (9p) two times at five and four stones handicap. In 2013, Crazy Stone beat Yoshio Ishida (9p) at four-stones handicap.

According to AlphaGo's David Silver, the AlphaGo research project was formed around 2014 to test how well a neural network using deep learning can compete at Go. AlphaGo represents a significant improvement over previous Go programs. In 500 games against other available Go programs, including Crazy Stone and Zen, AlphaGo running on a single computer won all but one. In a similar matchup, AlphaGo running on multiple computers won all 500 games played against other Go programs, and 77% of games played against AlphaGo running on a single computer. The distributed version in October 2015 was using 1,202 CPUs and 176 GPUs.

Match against Fan Hui

In October 2015, the distributed version of AlphaGo defeated the European Go champion Fan Hui, a 2-dan (out of 9 dan possible) professional, five to zero. This was the first time a computer Go program had beaten a professional human player on a full-sized board without handicap. The announcement of the news was delayed until 27 January 2016 to coincide with the publication of a paper in the journal Nature describing the algorithms used.

Match against Lee Sedol

AlphaGo played South Korean professional Go player Lee Sedol, ranked 9-dan, one of the best players at Go, with five games taking place at the Four Seasons Hotel in Seoul, South Korea on 9, 10, 12, 13, and 15 March 2016, which were video-streamed live. Aja Huang, a DeepMind team member and amateur 6-dan Go player, placed stones on the Go board for AlphaGo, which ran through Google's cloud computing with its servers located in the United States. The match used Chinese rules with a 7.5-point komi, and each side had two hours of thinking time plus three 60-second byoyomi periods. The version of AlphaGo playing against Lee used a similar amount of computing power as was used in the Fan Hui match. The Economist reported that it used 1,920 CPUs and 280 GPUs.

At the time of play, Lee Sedol had the second-highest number of Go international championship victories in the world. While there is no single official method of ranking in international Go, some sources ranked Lee Sedol as the fourth-best player in the world at the time. AlphaGo was not specifically trained to face Lee.

The first three games were won by AlphaGo following resignations by Lee Sedol. However, Lee Sedol beat AlphaGo in the fourth game, winning by resignation at move 180. AlphaGo then continued to achieve a fourth win, winning the fifth game by resignation.

The prize was $1 million USD. Since AlphaGo won four out of five and thus the series, the prize will be donated to charities, including UNICEF. Lee Sedol received $150,000 for participating in all five games and an additional $20,000 for his win.

On June 29th, at a presentation held at a University in the Netherlands, Aja Huang, one of the Deep Mind team, revealed that it had rectified the problem that occurred during the 4th game of the match between AlphaGo and Lee Sedol, and that after move 78 (which was dubbed the "hand of God" by many professionals), it would play accurately and maintain Black's advantage, since before the error which resulted in the loss, AlphaGo was leading throughout the game and Lee's move was not credited as the one which won the game, but caused the program's computing powers to be diverted and confused. Aja Huang explained that AlphaGo's policy network of finding the most accurate move order and continuation did not precisely guide AlphaGo to make the correct continuation after move 78, since its value network did not determine Lee Sedol's 78th move as being the most likely, and therefore when the move was made AlphaGo could not make the right adjustment to the logical continuation.

Unofficial online matches in late 2016 to early 2017

On December 29 in 2016, a new account named "Magist" from South Korea began to play games with professional players on the Tygem server. It changed its account name to "Master" on 30 December, then moved to the FoxGo server on 1 January 2017. On 4 January, DeepMind confirmed that the "Magister" and the "Master" were both played by an updated version of AlphaGo. As of 5 January 5 2017, AlphaGo's online record was 60 wins and 0 losses, including three victories over Go's top ranked player, Ke Jie, who had been quietly briefed in advance that Master was a version of AlphaGo. After losing to Master, Gu Li offered a bounty of 100,000 yuan (14,400 USD) to the first human player who could defeat Master. Master played at the pace of 10 games per day. Many quickly suspected it to be an AI player due to little or no resting between games. Its adversaries included many world champions such as Ke Jie, Park Jeong-hwan, Yuta Iyama, Tuo Jiaxi, Mi Yuting, Shi Yue, Chen Yaoye, Li Qincheng, Gu Li, Chang Hao, Tang Weixing, Fan Tingyu, Zhou Ruiyang, Jiang Weijie, Chou Chun-hsun, Kim Ji-seok, Kang Dong-yun, Park Yeong-hun, and Won Seong-jin; national champions or world championship runners-up such as Lian Xiao, Tan Xiao, Meng Tailing, Dang Yifei, Huang Yunsong, Yang Dingxin, Gu Zihao, Shin Jinseo, Cho Han-seung, and An Sungjoon. All 60 games except one were fast paced games with three 20 or 30 seconds byo-yomi. Master offered to extend the byo-yomi to one minute when playing with Nie Weiping due to his old age. After winning its 59th game Master revealed itself in the chatroom to be controlled by Dr. Aja Huang of the DeepMind team, then changed its nationality to United Kingdom. After these games were completed, the co-founder of Google DeepMind, Demis Hassabis said in a tweet "we're looking forward to playing some official, full-length games later [2017] in collaboration with Go organizations and experts".

Human players tend to make more mistakes in fast paced online games than in full-length tournament games due to short response time. It isn't definitively known whether AlphaGo will succeed as well in tournaments as it has online. However, Go experts are extremely impressed by AlphaGo's performance and by its nonhuman play style; Ke Jie stated that "After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong... I would go as far as to say not a single human has touched the edge of the truth of Go."

Wuzhen Future of Go Summit

In late May 2017, AlphaGo will play several exhibition games in Wuzhen, including:

  • Pair Go: human plus AlphaGo versus human plus AlphaGo
  • AlphaGo versus a collaborating team of top Chinese professionals
  • A best of 3 match versus world number 1, Ke Jie

Machine Learning Games Video



Hardware

An early version of AlphaGo was tested on hardware with various numbers of CPUs and GPUs, running in asynchronous or distributed mode. Two seconds of thinking time was given to each move. The resulting Elo ratings are listed below. In the matches with more time per move higher ratings are achieved.

In May 2016, Google unveiled its own proprietary hardware "tensor processing units", which it stated had already been deployed in multiple internal projects at Google, including the AlphaGo match against Lee Sedol.


How I built a game that won the 2016 Azure Machine Learning Award
photo src: medium.freecodecamp.com


Algorithm

As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network," both implemented using deep neural network technology. A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks.

The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves. Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play. To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the March 2016 match against Lee, the resignation threshold was set to 20%.


Machine Learning | DeepMind AlphaGo's Historical AI Win
photo src: www.liftigniter.com


Style of play

Toby Manning, the match referee for AlphaGo vs. Fan Hui, has described the program's style as "conservative". AlphaGo's playstyle strongly favours greater probability of winning by fewer points over lesser probability of winning by more points. Its strategy of maximising its probability of winning is distinct from what human players tend to do which is to maximise territorial gains, and explains some of its odd-looking moves.


Machine Learning For Games - YouTube
photo src: www.youtube.com


Responses to 2016 victory against Lee Sedol

AI community

AlphaGo's March 2016 victory was a major milestone in artificial intelligence research. Go had previously been regarded as a hard problem in machine learning that was expected to be out of reach for the technology of the time. Most experts thought a Go program as powerful as AlphaGo was at least five years away; some experts thought that it would take at least another decade before computers would beat Go champions. Most observers at the beginning of the 2016 matches expected Lee to beat AlphaGo.

With games such as checkers (that has been "solved" by the Chinook draughts player team), chess, and now Go won by computers, victories at popular board games can no longer serve as major milestones for artificial intelligence in the way that they used to. Deep Blue's Murray Campbell called AlphaGo's victory "the end of an era... board games are more or less done and it's time to move on."

When compared with Deep Blue or with Watson, AlphaGo's underlying algorithms are potentially more general-purpose, and may be evidence that the scientific community is making progress towards artificial general intelligence. Some commentators believe AlphaGo's victory makes for a good opportunity for society to start discussing preparations for the possible future impact of machines with general purpose intelligence. (As noted by entrepreneur Guy Suter, AlphaGo itself only knows how to play Go, and doesn't possess general purpose intelligence: "[It] couldn't just wake up one morning and decide it wants to learn how to use firearms") In March 2016, AI researcher Stuart Russell stated that "AI methods are progressing much faster than expected, (which) makes the question of the long-term outcome more urgent," adding that "in order to ensure that increasingly powerful AI systems remain completely under human control... there is a lot of work to do." Some scholars, such as Stephen Hawking, warned (in May 2015 before the matches) that some future self-improving AI could gain actual general intelligence, leading to an unexpected AI takeover; other scholars disagree: AI expert Jean-Gabriel Ganascia believes that "Things like 'common sense'... may never be reproducible", and says "I don't see why we would speak about fears. On the contrary, this raises hopes in many domains such as health and space exploration." Computer scientist Richard Sutton "I don't think people should be scared... but I do think people should be paying attention."

Go community

Go is a popular game in China, Japan and Korea, and the 2016 matches were watched by perhaps a hundred million people worldwide. Many top Go players characterized AlphaGo's unorthodox plays as seemingly-questionable moves that initially befuddled onlookers, but made sense in hindsight: "All but the very best Go players craft their style by imitating top players. AlphaGo seems to have totally original moves it creates itself." AlphaGo appeared to have unexpectedly become much stronger, even when compared with its October 2015 match where a computer had beat a Go professional for the first time ever without the advantage of a handicap. The day after Lee's first defeat, Jeong Ahram, the lead Go correspondent for one of South Korea's biggest daily newspapers, said "Last night was very gloomy... Many people drank alcohol." The Korea Baduk Association, the organization that oversees Go professionals in South Korea, awarded AlphaGo an honorary 9-dan title for exhibiting creative skills and pushing forward the game's progress.

China's Ke Jie, an 18-year-old generally recognized as the world's best Go player, initially claimed that he would be able to beat AlphaGo, but declined to play against it for fear that it would "copy my style". As the matches progressed, Ke Jie went back and forth, stating that "it is highly likely that I (could) lose" after analysing the first three matches, but regaining confidence after AlphaGo displayed flaws in the fourth match.

Toby Manning, the referee of AlphaGo's match against Fan Hui, and Hajin Lee, secretary general of the International Go Federation, both reason that in the future, Go players will get help from computers to learn what they have done wrong in games and improve their skills.

After game two, Lee said he felt "speechless": "From the very beginning of the match, I could never manage an upper hand for one single move. It was AlphaGo's total victory." Lee apologized for his losses, stating after game three that "I misjudged the capabilities of AlphaGo and felt powerless." He emphasized that the defeat was "Lee Se-dol's defeat" and "not a defeat of mankind". Lee said his eventual loss to a machine was "inevitable" but stated that "robots will never understand the beauty of the game the same way that we humans do." Lee called his game four victory a "priceless win that I (would) not exchange for anything."


Machine Learning - Machine Learning
photo src: machinelearning.ai


Similar systems

Facebook has also been working on their own Go-playing system darkforest, also based on combining machine learning and tree search. Although a strong player against other computer Go programs, as of early 2016, it had not yet defeated a professional human player. darkforest has lost to CrazyStone and Zen and is estimated to be of similar strength to CrazyStone and Zen.

DeepZenGo, a system developed with support from video-sharing website Dwango and the University of Tokyo, lost 2-1 in November 2016 to Go master Cho Chikun, who holds the record for the largest number of Go title wins in Japan.


SciSports fundamentals: Machine Learning - SciSports
photo src: www.scisports.com


Example game

AlphaGo (white) v. Tang Weixing (31 December 2016), AlphaGo won by resignation. White 36 was widely praised.

Source of the article : Wikipedia



EmoticonEmoticon

 

Start typing and press Enter to search