The tricked out version of the ANYmal quadruped, as customized by Zürich-based Swiss-Mile, just keeps getting better and better. Starting with a commercial quadruped, adding powered wheels made the robot fast and efficient, while still allowing it to handle curbs and stairs. A few years ago, the robot learned how to stand up, which is an efficient way of moving and made the robot much more pleasant to hug, but more importantly, it unlocked the potential for the robot to start doing manipulation with its wheel-hand-leg-arms.
Doing any sort of practical manipulation with ANYmal is complicated, because its limbs were designed to be legs, not arms. But at the Robotic Systems Lab at ETH Zurich, they’ve managed to teach this robot to use its limbs to open doors, and even to grasp a package off of a table and toss it into a box.
When it makes a mistake in the real world, the robot has already learned the skills to recover.
The ETHZ researchers got the robot to reliably perform these complex behaviors using a kind of reinforcement learning called ‘curiosity driven’ learning. In simulation, the robot is given a goal that it needs to achieve—in this case, the robot is rewarded for achieving the goal of passing through a doorway, or for getting a package into a box. These are very high-level goals (also called “sparse rewards”), and the robot doesn’t get any encouragement along the way. Instead, it has to figure out how to complete the entire task from scratch.
The next step is to endow the robot with a sense of contact-based surprise.
Given an impractical amount of simulation time, the robot would likely figure out how to do these tasks on its own. But to give it a useful starting point, the researchers introduced the concept of curiosity, which encourages the robot to play with goal-related objects. “In the context of this work, ‘curiosity’ refers to a natural desire or motivation for our robot to explore and learn about its environment,” says author Marko Bjelonic, “Allowing it to discover solutions for tasks without needing engineers to explicitly specify what to do.” For the door-opening task, the robot is instructed to be curious about the position of the door handle, while for the package-grasping task, the robot is told to be curious about the motion and location of the package. Leveraging this curiosity to find ways of playing around and changing those parameters helps the robot achieve its goals, without the researchers having to provide any other kind of input.
The behaviors that the robot comes up with through this process are reliable, and they’re also diverse, which is one of the benefits of using sparse rewards. “The learning process is sensitive to small changes in the training environment,” explains Bjelonic. “This sensitivity allows the agent to explore various solutions and trajectories, potentially leading to more innovative task completion in complex, dynamic scenarios.” For example, with the door opening task, the robot discovered how to open it with either one of its end-effectors, or both at the same time, which makes it better at actually completing the task in the real world. The package manipulation is even more interesting, because the robot sometimes dropped the package in training, but it autonomously learned how to pick it up again. So, when it makes a mistake in the real world, the robot has already learned the skills to recover.
There’s still a bit of research-y cheating going on here, since the robot is relying on the visual code-based AprilTags system to tell it where relevant things (like door handles) are in the real world. But that’s a fairly minor shortcut, since direct detection of things like doors and packages is a fairly well understood problem. Bjelonic says that the next step is to endow the robot with a sense of contact-based surprise, in order to encourage exploration, which is a little bit gentler than what we see here.
Remember, too, that while this is definitely a research paper, Swiss-Mile is a company that wants to get this robot out into the world doing useful stuff. So, unlike most pure research that we cover, there’s a slightly better chance here for this ANYmal to wheel-hand-leg-arm its way into some practical application.
Reference: https://ift.tt/DxlaLSP
No comments:
Post a Comment