While artificial intelligence software has made huge strides recently, in many cases, it has only been automating things that humans already do well. If you want an AI to identify the Higgs boson in a spray of particles, for example, you have to train it on collisions that humans have already identified as containing a Higgs. If you want it to identify pictures of cats, you have to train it on a database of photos in which the cats have already been identified.
(If you want AI to name a paint color, well, we haven’t quite figured that one out.)
But there are some situations where an AI can train itself: rules-based systems in which the computer can evaluate its own actions and determine if they were good ones. (Things like poker are good examples.) Now, a Google-owned AI developer has taken this approach to the game Go, in which AIs only recently became capable of consistently beating humans. Impressively, with only three days of playing against itself with no prior knowledge of the game, the new AI was able to trounce both humans and its AI-based predecessors.
In a new paper describing their creation, the people at the company DeepMind contrast their new AI with their earlier Go-playing algorithms. The older algorithms contained two separate neural networks. One of them, trained using human experts, was dedicated to evaluating the most probable move of a human opponent. A second neural network was trained to predict the winner of the game following a given move. These were combined with software that directed them to evaluate possible future moves to create a human-beating system, although it required multiple computers equipped with an application-specific processors developed by Google called tensor processing units.
While the results were impressive enough to consistently beat top human players, they required expert input during the training. And that creates two limitations. The algorithm can only perform tasks where human experts already exist, and they’re unlikely to do things that a human would never consider.
So the people at DeepMind decided to make a Go-playing AI that could teach itself how to play. To do so, they used a process called reinforcement learning. The new algorithm, called AlphaGo Zero, would learn by playing against a second instance of itself. Both Zeroes would start off with knowledge of the rules of Go, but they would only be capable of playing random moves. Once a move was played, however, the algorithm tracked if it was associated with better game outcomes. Over time, that knowledge led to more sophisticated play.
Over time, AlphaGo Zero built up a tree of possible moves, along with values associated with the game outcomes in which they were played. It also kept track of how often a given move had been played in the past, so it could quickly identify moves that were consistently associated with success. Since both instances of the neural network were improving at the same time, the procedure ensured that AlphaGo Zero was always playing against an opponent that was challenging at its current skill level.
The DeepMind team ran the AI against itself for three days, during which it completed nearly five million games of Go. (that’s about 0.4 seconds per move). When the training was complete, they set it up with a machine that had four tensor processing units and put Zero against one of their earlier, human-trained iterations, which was given multiple computers and a total of 48 tensor processing units. AlphaGo Zero romped, beating its opponent 100 games to none.
Tests with partially trained versions showed that Zero was able to start beating human-trained AIs in as little as a day. The DeepMind team then continued training for 40 days. By day four, it started consistently beating an earlier, human-trained version that was the first capable of beating human grandmasters. By day 25, Zero started consistently beating the most sophisticated human-trained AI. And at day 40, it beat that AI in 89 games out of 100. Obviously, any human player facing it was stomped.
So what did AlphaGo Zero’s play look like? For the openings of the games, it often started with moves that had already been identified by human masters. But in some cases, it developed distinctive variations on these. The end game is largely constrained by the board, and so the moves also resembled what a human might do. But in the middle, the AI’s moves didn’t seem to follow anything a human would recognize as a strategy; instead, it would consistently find ways to edge ahead of any opponent, even if it lost ground on some moves.
This doesn’t mean that DeepMind has crafted an AI that can do anything. To train itself, AlphaGo Zero had to be limited to a problem in which clear rules limited its actions and clear rules determined the outcome of a game. Not every problem is so neatly defined (and fortunately, the outcomes of an AI uprising probably fall into the “poorly defined” category). And human players are treating this as a reason for excitement. In an accompanying perspective, two members of the American Go Association suggest that studying the games played among the AIs will give them a new chance to understand their own game.
Nature, 2017. DOI: 10.1038/nature24270 (About DOIs).