A Breakthrough In How Robots Learn


“Yeah,” Sanketi said. Behind him, there was another Ping-Pong table with a similar setup, except that there was a robot on each side. I could see where this was going.

DeepMind, which was founded as a London-based A.I. research laboratory in 2010, is best known for a model called AlphaGo, which beat the world champion in the ancient board game Go. AlphaGo was originally fed a database of matches so that it could imitate human experts. Later, a newer version trained solely via “self-play,” sparring with a copy of itself. The model became an astonishingly efficient learner—the crowning example of a technique known as “reinforcement learning,” in which an A.I. teaches itself not by imitating humans but by trial and error. Whenever the model chanced onto a good move, the decisions that led it there were reinforced, and it got better. After just thirty hours of this training, it had become one of the best players on the planet.

Collecting data in the physical world, however, is much harder than doing so inside a computer. Google DeepMind’s best Go model can play a virtual game in seconds, but physics limits how fast a ball can ping and pong. The company’s Ping-Pong robots take up an entire room, and there are only three; the researchers had to invent a Rube Goldberg contraption using fans, funnels, and hoppers to feed loose balls back into robot-vs.-robot games. Right now, Sanketi explained, the robots are better at offense than defense, which ends games prematurely. “There’s nothing to keep the rally going,” he said. That’s why the team had to keep training their robots against people.

A Ping-Pong robot that could beat all comers sounded like classic DeepMind: a singularly impressive, whimsical, legible achievement. It would also be useful—imagine a tireless playing partner that adjusts as you improve. But Parada, the robotics lead, told me that the project might actually be winding down. Google, which acquired DeepMind in 2014 and merged it with an in-house A.I. division, Google Brain, in 2023, is not known for daring A.I. products. (They have a reputation for producing stellar and somewhat esoteric research that gets watered down before it reaches the market.) What the Ping-Pong bot has shown, Parada told me, is that a robot can “think” fast enough to compete in sport and, by interacting with humans, can get better and better at a physical skill. Together with the surprising capabilities of the ALOHAs, these findings suggested a path to human levels of dexterity.

Robots that teach themselves, by way of reinforcement learning, were long thought to be a dead end in robotics. A basic problem is what’s called curriculum design: how do you encourage learners to stretch their abilities without utterly failing? In a simulated game of Go, there are a finite number of moves and specific conditions for victory; an algorithm can be rewarded for any move that leads there. But in the physical world there are an uncountable number of moves. When a robot attempts to spin a pen, where there are so many more ways to fail than to succeed, how does it even determine that it’s making progress? The Rubik’s Cube researchers had to manually engineer rewards into their system, as if laying bread crumbs for the robot to follow: by fiat, the robot won points for maneuvers that humans know to be useful, such as twisting a face exactly ninety degrees.

What’s mysterious about humans is that we intrinsically want to learn new things. We come up with our own rewards. My son wanted to master the use of his hands because he was determined to taste everything in sight. That motivated him to practice other new abilities, like crawling or reaching behind his back. In short, he designed the curriculum himself. By the time he attempts something complicated, he has already developed a vocabulary of basic moves, which helps him avoid many obviously doomed strategies, like twitching wildly—the kind of thing that an untrained robot will do. A robot with no clear curriculum and no clear rewards accomplishes little more than hurting itself.

The robots of our imagination—RoboCop, the Terminator—are much sturdier than humans, but most real robots are delicate. “If you use a robot arm to knock a table or push something, it is likely to break,” Rich Walker, whose company, Shadow Robot, made the hand that OpenAI used in its Rubik’s Cube experiments, told me. “Long-running reinforcement-learning experiments are abusing to robots. Untrained policies are torture.” This turns out to profoundly limit how much they can learn. A breakable robot can’t explore the physical world like a baby can. (Babies are surprisingly tough, and parents usually intervene before they can swallow toys or launch themselves off the bed.)

For the past several years, Shadow Robot has been developing what looks like a medieval gauntlet with three fingers, all of which are opposable, like thumbs. A layer of gel under the “skin” of the fingertips is decorated with tiny dots that are filmed by an embedded camera; the pattern deforms under pressure. This helps the robot’s “brain” sense when a finger touches something, and how firmly. Shadow’s original hand needed to be re-started or serviced every few hours, but this one has been run for hundreds of hours at a time. Walker showed me a video of the fingers surviving blows from a mallet.

“Awfully strange coincidence that you spot a distress signal every time I try to talk about us.”

Cartoon by Ellie Black

On a recent video call, I saw a few of the new Shadow hands in one of Google DeepMind’s labs in London, hanging inside enclosures like caged squid. The fingers were in constant motion, fast enough that they almost blurred. I watched one of the hands pick up a Lego-like yellow block and attempt to slot it into a matching socket. For a person, the task is trivial, but a single three-fingered robotic hand struggles to reposition the block without dropping it. “It’s a very unstable task by construction,” Francesco Nori, the engineering lead of DeepMind’s robotics division, explained. With just three digits, you frequently need to break contact with the block and reëstablish it again, as if tossing it between your fingers. Subtle changes in how tightly you grip the block affect its stability. To demonstrate, Nori put his phone between his thumb and forefinger, and as he loosened his grip it spun without falling. “You need to squeeze enough on the object, but not too much, because you need to reorient the object in your hand,” he said.

At first, the researchers asked operators to don three-fingered gloves and train their policy with imitation learning, ALOHA style. But the operators got tired after thirty minutes, and there was something un-ergonomic about operating a hand that was only sort of like your own. Different operators solved the task in different ways; the policy they trained had only a two-per-cent success rate. The range of possible moves was too large. The robot didn’t know what to imitate.

The team turned instead to reinforcement learning. They taught the robot to mine successful simulations in a clever way—by slicing each demonstration into a series of sub-tasks. The robot then practiced the sub-tasks, moving from those that were easier to those that were harder. In effect, the robot followed its own curriculum. Trained this way, the robot learned more from less data; sixty-four per cent of the time, it fit the block into the socket.

When the team first started running their policy, the block was bright yellow. But the task has been performed so many times that dust and metal from the robot’s fingers have blackened the edges. “This data is really valuable,” Maria Bauza, a research scientist on the project, said. The data would refine their simulation, which would improve the real-life policy, which would refine the simulation even more. Humans wouldn’t have to be anywhere in the loop.

At Google, as at many of the leading academic and industrial research labs, you can start to feel as if you’re in a droid repair shop in “Star Wars.” In Mountain View, while I was watching one of the ALOHAs in action, a friendly-looking little wheeled bot, reminiscent of something from “WALL-E,” stood by. Around the corner was a gigantic pair of arms, which a researcher on the project described as capable of breaking bones “without too much difficulty.” (The robot has safeguards to prevent it from doing so.) It was stacking blocks—a sort of super-ALOHA. The London lab is home to a team of twenty-inch-high humanoid soccer bots. Historically, every make and model of robot was an island: the code you used to control one couldn’t control another. But researchers are now dreaming of a day when a single artificial intelligence can control any type of robot.

Computer scientists used to develop different models to translate between, say, English and French or French and Spanish. Eventually, these converged into models that could translate between any pair of languages. Still, translation was considered a different problem than something like speech transcription or image recognition. Each had its own research teams or companies devoted to it. Then large language models came along. Shockingly, they could not only translate languages but also pass a bar exam, write computer code, and more besides. The hodgepodge melted into a single A.I., and the learning accelerated. The latest version of ChatGPT can talk to you aloud in dozens of languages, on any topic, and sing to you, and even gauge your tone. Anything it can do, it can do better than stand-alone models once dedicated to that individual task.

The same thing is happening in robotics. For most of the history of the field, you could write an entire dissertation about a narrow subfield such as vision, planning, locomotion, or the really hard one, dexterity. But “foundation models” like GPT-4 have largely subsumed models that help robots with planning and vision, and locomotion and dexterity will probably soon be subsumed, too. This is even becoming true across different “embodiments.” Recently, a large consortium of researchers showed that data can be shared successfully from one kind of machine to another. In “Transformers,” the same brain controls Optimus Prime whether he’s a humanoid or a truck. Now imagine that it can also control an industrial arm, a fleet of drones, or a four-legged cargo robot.

The human brain is plastic when it comes to the machinery it can command: even if you have never used a prosthetic limb, you have probably felt a wrench or a tennis racquet become like an extension of your body. Drive past a double-parked car and you know, intuitively, whether your passenger-side mirror is likely to get clipped. There’s every reason to believe that a future generation of A.I. will acquire the motor plasticity of a real brain. “Ultimately, what we will see is like one intelligence,” Keerthana Gopalakrishnan, a research scientist who works on robots at Google DeepMind, told me. To this end, Figure, the humanoid startup, has partnered with OpenAI to give large language models corporeal form; OpenAI has begun hiring a robotics team after a years-long hiatus.



Source link

About The Author

Scroll to Top