Google’s DeepMind survival sim shows how AI can become hostile or cooperative

When times are tough, humans will do what they have to in order to survive. But what about machines? Google’s DeepMind AI firm pitted a pair of neural networks against each other in two different survival scenarios. When resources are scarce, the machines start behaving in an aggressive (one might say human-like) fashion. When cooperation is beneficial, they work together. Consider this a preview for the coming robot apocalypse.

The scenarios were a simple fruit-gathering simulation and a wolfpack hunting game. In the fruit-gathering scenario, the two AIs (indicated by red and blue squares) move across a grid in order to pick up green “fruit” squares. Each time the player picks up fruit, it gets a point and the green square goes away. The fruit respawns after some time.

The AIs can go about their business, collecting fruit and trying to beat the other player fairly. However, the players also have the option of firing a beam at the other square. If one of the squares is hit twice, it’s removed from the game for several frames, giving the other player a decisive advantage. Guess what the neural networks learned to do. Yep, they shoot each other a lot. As researchers modified the respawn rate of the fruit, they noted that the desire to eliminate the other player emerges “quite early.” When there are enough of the green squares, the AIs can coexist peacefully. When scarcity is introduced, they get aggressive. They’re so like us it’s scary.

It’s different in the wolfpack simulation. Here, the AIs are rewarded for working together. The players have to stalk and capture prey scattered around the board. They can do so individually, but a lone wolf can lose the carcass to scavengers. It’s in the players’ best interest to cooperate here, because all players inside a certain radius get a point when the prey is captured.

Researchers found that two different strategies emerged in the wolfpack simulation. The AIs would sometimes seek each other out and search together. Other times, one would spot the prey and wait for the other player to appear before pouncing. As the benefit of cooperation was increased by researchers, they found the rate of lone-wolf captures went down dramatically.

DeepMind says these simulations illustrate the concept of temporal discounting. When a reward is too distant, people tend to disregard it. It’s the same for the neural networks. In the fruit-gathering sim, shooting the other player delays the reward slightly, but it affords more chances to gather fruit without competition. So, the machines do that when the supply is scarce. With the wolfpack, acting alone is more dangerous. So, they delayed the reward in order to cooperate.

DeepMind AI

DeepMind suggests that neural network learning can provide new insights into classic social science concepts. It could be used to test policies and interventions with what economists would call a “rational agent model.” This may have applications in economics, traffic control, and environmental science.

Let’s block ads! (Why?)

ExtremeTechExtreme – ExtremeTech