DeepMind’s AlphaStar Beats Humans 10-0 (or 1)

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. I think this is one of the more important
things that happened in AI research lately. In the last few years, we have seen DeepMind
defeat the best Go players in the world, and after OpenAI’s venture in the game of DOTA2,
it’s time for DeepMind to shine again as they take on Starcraft 2, a real-time strategy
game. The depth and the amount of skill required
to play this game is simply astounding. The search space of Starcraft 2 is so vast
that it exceeds both Chess, and even Go by a significant margin. Also, it is a game that requires a great deal
of mechanical skill, split-second decision making and we have imperfect information as
we only see what our units can see. A nightmare situation for any AI. DeepMind invited a beloved pro player, TLO
to play a few games against their new StarCraft 2 AI that goes by the name AlphaStar. Note that TLO a profesional player who is
easily in top 1% of players, or even better. Mid-grandmaster for those who play StarCraft
2.

This video is about what happened during this
event, and later, I will make another video that describes the algorithm that was used
to create this AI. The paper is still under review, so it will
take a little time until I can get my hands on it. At the end of this video, you will also see
the inner workings of this AI. Let’s dive in. This is an AI that looked at a few games played
by human players, and after that initial step, it learns by playing against itself for about
200 years. In our next episode, you will see how this
is even possible, so I hope you are subscribed to the series. You see here that the AI controls the blue
units, and TLO, the human player plays red.

Right at the start of the first game, the
AI did something interesting. In fact, what is interesting is what it didn’t
do. It started to create new buildings next to
its nexus, instead of building a walloff that you can see here. Using a walloff is considered standard practice
in most games, and the AI used these buildings to not wall off the entrance, but to shield
away the workers from possible attacks. Now note that this is not unheard of, but
this is also not a strategy that is widely played today and is considered non-standard.

It also built more worker units than what
is universally accepted as standard, we found out later that this was partly done in anticipation
of losing a few of them early on. Very cool. Then, almost before we even knew what happened,
it won the first game a little more than 7 minutes in, which is very quick, noting that
in-game time is a little faster than real-time. The thought process of TLO at this point is
that that’s interesting, but okay, well, the AI plays aggressively and managed to pull
this one off.

No big deal. We will fire up the second game, in the meantime,
few interesting details. The goal of setting up the details of this
algorithm was that the number of actions performed by the AI roughly matches a human player,
and hopefully it still plays as well, or better. It has to make meaningful strategic decisions. You see here that this checks out for the
average actions every minute, but if you look here, you see around the tail end here that
there are times when it performs more actions than humans and this may enable playstyles
that are not accessible for human players. However, note that many times it also does
miraculous things with very few actions. Now, what about an other important detail,
reaction time? The reaction time of the AI is set to 350ms,
which is quite slow. That’s excellent news because this is usually
a common angle of criticism for game AIs.

The AI also sees the whole map at once, but
it is not given more information than what its units can see. This perhaps is the most commonly misunderstood
detail, so it is worth noting. So, in other words, it sees exactly what a
human would see if the human would move the camera around very quickly, but, it doesn’t
have to move the camera, which adds additional actions and cognitive load to the human, so
one might say that the AI has an edge here. The AI plays these games independently, what’s
more, each game was played by a different AI, which also means that they do not memorize
what happened in the last game like a human would.

Early in the next game, we can see the utility
of the walloff in action which is able to completely prevent the AIs early attack. Later that game, the AI used disruptors, a
unit, which if controlled with such level of expertise, can decimate the army of the
opponent with area damage by killing multiple units at once. It has done an outstanding job picking away
at the army of TLO. Then, after getting a significant advantage,
AlphaStar loses it with a few sloppy plays and by deciding to engage aggressively while
standing in tight choke points. You can see that this is not such a great
idea. This was quite surprising as this is considered
to be StarCraft 101 knowledge right there. During the remainder of the match, the commentators
mentioned that they play and watch matches all the time and the AI came up with an army
composition that they have never seen during a professional match. And, the AI won this one too. After this game it became clear that these
agents can play any style in the game.

Which is terrifying. Here you can see an alternative visualization
that shows a little more of the inner workings of the neural network. We can see what information it gets from the
game, a visualization of neurons that get activated within the network, what locations
and units are considered for the next actions, and whether the AI predicts itself as the
winner or loser of the game. If you look carefully, you will also see the
moment when the agent becomes certain that it will win this game. I could look at this all day long, and if
you feel the same way, make sure to visit the video description, I have a link to the
source video for you. The final result against TLO was 5 to 0, so
that’s something, and he mentioned that the AlphaStar played very much like a human
does and almost always managed to outmaneuver him. However, TLO also mentioned that he is confident
that upon playing more training matches against these agents, he would be able to defeat the
AI.

I hope he will be given a chance to do that. This AI seems strong, but still beatable. I would also note that many of you would probably
expect the later versions of AlphaStar to be way better than this one. The good news is that the story continues
and we’ll see whether that’s true! So at this point, the DeepMind scientists
said that “maybe we could try to be a bit more ambitious”, and asked “can you bring
us someone better”? And in the meantime, pressed that training
button on the AI again.

In comes MaNa, a top tier pro player. One of the best Protoss players in the world. This was a nerve-wracking moment for DeepMind
scientists as well, because their agents played against each other, so they only knew the
AI’s winrate against a different AI, but they didn’t know how they would compete
against a top pro player. It may still have holes in its strategy. Who knows what would happen? Understandably, they had very little confidence
in winning this one. What they didn’t expect is that this new
AI was not slightly improved, or somewhat improved. No, no, no. This new AI was next level. This set of improved agents among many other
skills, had incredibly crisp micromanagement of each individual unit. In the first game, we’ve seen it pulling
back injured units but still letting them attack from afar masterfully, leading to an
early win for the AI against Mana in the first game.

He and the commentators were equally shocked
by how well the agent played. And I will add that I remember from watching
many games from a now inactive player by the name MarineKing a few years ago. And I vividly remember that he played some
of his games so well, the commentators said that there’s no better way to put it, he
played like a god. I am almost afraid to say that this micromanagement
was even more crisp than that. This AI plays phenomenal games. In later matches, the AI did things that seemed
like blunders, like attacking on ramps and standing in choke points, or using unfavorable
unit compositions and refusing to change it and, get this, it still won all of those games
5 to 0.

Against a top pro player. Let that sink in. The competition was closed by a match where
the AI was asked to also do the camera management. The agent was still very competent, but somewhat
weaker and as a result, lost this game, hence the “0 or 1” part in the title. My impression is that it was asked to do something
that it was not designed for, and expect a future version to be able to handle this use
case as well. I will also commend Mana for his solid game
plan for this game, and also, huge respect for DeepMind for their sportsmanship. Interestingly, in this match, Mana also started
using a worker oversaturation strategy that I mentioned earlier. This he learned from AlphaStar and used it
in his winning game. Isn’t that amazing? DeepMind also offered a reddit AMA where anyone
could ask them questions to make sure to clear up any confusion, for instance, the actions
per minute part has been addressed, I’ve included a link to that for you in the description. To go from a turn-based perfect information
game, like Go, to a real time strategy game of imperfect information in about a year sounds
like science fiction to me.

And yet, here it is. Also, note that DeepMind’s goal is not to
create a godlike StarCraft 2 AI. They want to solve intelligence, not StarCraft
2, and they used the game as a vehicle to demonstrate its long-term decision making
capabilities against human players. One more important thing to emphasize is that
the building blocks of AlphaStar are meant to be reasonably general AI algorithms, which
means that parts of this AI can be reused for other things, for instance, Demis Hassabis
mentioned weather prediction and climate modeling as examples. If you take only one thought from this video,
let it be this one. I urge you to watch all the matches because
what you are witnessing may very well be history in the making. I put a link to the whole event in the video
description, plus plenty more materials, including other people’s analysis, Mana’s personal
experience of the event, his breakdown of his games and what was going through his head
during the event. I highly recommend checking out his 5th game,
but really, go through them all, it’s a ton of fun! I made sure to include a more skeptical analysis
of the game as well to give you a balanced portfolio of insights.

Also, huge respect for DeepMind and the players
who practiced their chops for many many years and have played really well under immense
pressure. Thank you all for this delightful event. It really made my day. And the ultimate question is, how long did
it take to train these agents? 2 weeks. Wow. And what’s more, after the training step,
the AI can be deployed on an inexpensive consumer desktop machine. And this is only the first version. This is just a taste, and it would be hard
to overstate how big of a milestone this is.

And now, scientists at DeepMind have sufficient
data to calculate the amount of resources they need to spend to train the next, even
more improved agents. I am confident that they will also take into
consideration the feedback from the StarCraft community when creating this next version. What a time to be alive! What do you think about all this? Any predictions? Is this harder than DOTA2? Let me know in the comments section below. And remember, we humans build up new strategies
by learning from each other, and of course, the AI, as you have seen here, doesn’t care
about any of that. It doesn’t need intuition and can come up
with unusual strategies. The difference now is that these strategies
work against some of the best human players. Now it’s time for us to finally start learning
from an AI. gg. Thanks for watching and for your generous
support, and I'll see you next time!

Leave a Reply

Your email address will not be published. Required fields are marked *