I saw this incredible demonstration recently and was seriously impressed.
It’s powerful AI from none other than Google DeepMind, the geniuses behind that god-level chess-playing program, AlphaZero.
They’ve conquered the mental realm of chess and Go (unfortunately), so now they’re trying to conquer the physical realm of sports.
(And by the way, they’ve been working on AlphaCode, to destroy all programming jobs — should we be worried?)
And they’re already well on their way: The robot destroyed every single player it faced, at the beginner level.
55% of every intermediate-level player it played against.
For a sport like tennis, not only does the AI need sophisticated algorithms for intelligent decision-making.
It also needs physical components for quick reactions and precise movements to adequately make those decisions in the real world.
So this is the biggest problem that makes it impossible for an expert system or classical algorithm to have any chance:
How can we track this tiny, rapidly moving ball, predict its trajectory, and respond quickly and accurately according to the rules of the game?
Well, like in every problem in Computer Science and programming, it all comes back to input, processing, output.
Inputs
We only need visual input here.
And of course, you know the standard way computers receive visual input.
So the robot has multiple high-speed cameras to constantly capture images at an impressive rate of 125 images per second.
All these images are rapidly fed into a neural network that tracks the ball’s position in real time.
With this position, it can calculate key variables like speed and trajectory.
Processing
For processing the robot has two levels of control.
First there are the low-level controllers, a bunch of specialized neural networks trained to execute specific table tennis skills: backhand drives, forehand topspin… basically anything you could normally do with the ball as a human.
Then we have the high-level controller for more abstract decision-making. It processes the inputs to decide which atomic skill to perform.
I think it’s just like how our brains have regions for higher-level processing like the prefrontal cortex, and then other regions like the motor cortex for lower-level for planning and executing motion.
Output
All that processing would be useless if it couldn’t do anything in the real world; It needs to move.
That’s why the robot has a powerful IRB 1100 robotic arm, allowing it to easily reach almost any part of the table to quickly strike the ball.
In a way you could say the low-level controllers are the output of the high-level one’s processing, but they also do their own processing.
It can be better
It beat all the beginners and much of the intermediates.
But how many advanced players did it beat?
Zero.
It was just too slow for those masters.
One reason for this is that it takes quite some time for the sensors to read input, and also for the actuators to carry out the output in the real world.
It also seems to have issues with balls that are too low/high, or have too much spin.
Early beginnings thought, and overall it’s a great system showing off serious progress being made in AI and robotics.