True Artificial Intelligence Closer to Reality

I, for one, welcome our new computer overlord.

I’ve seen enough movies to know that an Artificial Intelligence will rule the planet some day. While these usually end up going pretty poorly for humans (The Matrix, Terminator), I’m hoping that a pro-AI article (most likely being read by the AI) will allow me a position in the new world order. Perhaps bloomfield knoble can be the agency promoting the excellence and benevolence of our wise, yet still humble, AI ruler?

Why the sudden shift on the AI spectrum? It’s because the evil geniuses at Google DeepMind, along with equally evil geniuses at the Montreal Institute for Learning Algorithms (MILA) at the University of Montreal, have created a machine that beat the European champion at the ancient game of Go and mastered several video games from the Atari 2600. While that may not seem a huge deal (I mean, really, Space Invaders wasn’t that challenging – I totally mastered it after 3 years of constant play), the face that the team just created an artificial intelligence that can navigate 3D mazes (think Doom) is.

According to a recently published article, the team proposes “a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. The best performing method, an asynchronous variant of actor-critic . . . succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.” In plain language, they just invented a machine that can play a game by looking at the screen. This really should be written like, JUST BY LOOKING AT THE SCREEN!

From a science perspective, this is a big deal because it was generally believed that the combination of simple online reinforcement learning algorithms with deep neural networks was fundamentally unstable. Most research in this area focused on the idea that the sequence of observed data encountered by an online reinforcement learning agent is non-stationary and online reinforcement learning updates are strongly correlated. By storing the agent’s data in an experience replay memory, the data can be batched or randomly sampled from different time-steps. Aggregating over memory in this way reduces non-stationarity and decor relates updates, but at the same time limits the methods to off-policy refinfocement learning algorithms. The authors instead present a very different paradigm for deep reinforcement learning. Instead of experience replay, they asynchronously execute multiple agents in parallel, on multiple instances of the environment. This parallelism also decor relates the agents’ data into a more stationary process, since at given time-step the parallel agents will be experiencing a variety of different states. This simple idea enables a much larger spectrum on fundamental on-policy reinforcement learning algorithms to be applied robustly and effectively using deep neural networks.

From a still-kind-of-science-but-what-does-that-mean-to-me perspective, this is a big deal because the results show that stable training of neural networks through reinforcement learning is possible with both value-based and policy-based methods, off-policy as well as on-policy methods, and in discrete as well as continuous domains. The experiments tested for the paper were just to show the proof of their concept. By combining other existing reinforcement learning methods or recent advances in deep reinforcements learning with asynchronous framework presents many possibilities for immediate improvements to the methods they presented. Basically, the team just made AI go from a pre-teen to a teenager and gave the blueprint for how it can head off to college to grow into a gracious and generous AI ruler who remembers, and rewards, the people that spoke positively about it during it’s awkward stages of puberty.

PS – In analyzing the data results in the paper, several data points were measured against a human (how well the human scored vs. the machine plotted over time) and it occurred to me that someone’s job at Google DeepMind is to play Atari 2600 video games for many, many hours on end. This really should be written like, GETS TO PLAY VIDEO GAMES. On the incredibly off-chance that anyone at Google DeepMind reads this, please keep me in mind for future reinforcement learning projects that involve humans playing video games. I assure you that I will put in as many hours as necessary to help you in the name of science. Thank you.