Monday, November 25, 2024

Robot Photographer Takes the Perfect Picture




Finding it hard to get the perfect angle for your shot? PhotoBot can take the picture for you. Tell it what you want the photo to look like, and your robot photographer will present you with references to mimic. Pick your favorite, and PhotoBot—a robot arm with a camera—will adjust its position to match the reference and your picture. Chances are, you’ll like it better than your own photography.

“It was a really fun project,” says Oliver Limoyo, one of the creators of PhotoBot. He enjoyed working at the intersection of several fields; human robot interaction, large language models, and classical computer vision were all necessary to create the robot.

Limoyo worked on PhotoBot while at Samsung, with his manager Jimmy Li. They were working on a project to have a robot take photographs but were struggling to find a good metric for aesthetics. Then they saw the Getty Image Challenge, where people recreated famous artwork at home during the COVID lockdown. The challenge gave Limoyo and Li the idea to have the robot select a reference image to inspire the photograph.

To get PhotoBot working, Limoyo and Li had to figure out two things: how best to find reference images of the kind of photo you want and how to adjust the camera to match that reference.

Suggesting a Reference Photograph

To start using PhotoBot, first you have to provide it with a written description of the photo you want. (For example, you could type “a picture of me looking happy”.) Then PhotoBot scans the environment around you, identifying the people and objects it can see. It next finds a set of similar photos from a database of labeled images that have those same objects.

Next an LLM compares your description and the objects in the environment with that smaller set of labeled images, providing the closest matches to use as reference images. The LLM can be programmed to return any number of reference photographs.

For example, when asked for “a picture of me looking grumpy” it might identify a person, glasses, a jersey, and a cup, in the environment. PhotoBot would then deliver a reference image of a frazzled man holding a mug in front of his face among other choices.

After the user selects the reference photograph they want their picture to mimic, PhotoBot moves its robot arm to correctly position the camera to take a similar picture.

Adjusting the Camera to Fit a Reference

To move the camera to the perfect position, PhotoBot starts by identifying features that are the same in both images, for example, someone’s chin, or the top of a shoulder. It then solves a “perspective-n-point” (PnP) problem, which involves taking a camera’s 2D view and matching it to a 3D position in space. Once PhotoBot has located itself in space, it then solves how to move the robot’s arm to transform its view to look like the reference image. It repeats this process a few times, making incremental adjustments as it gets closer to the correct pose.

Then PhotoBot takes your picture.

A college of photographs of people in different poses and outfits. Photobot’s developers compared portraits with and without their system.Samsung/IEEE

To test if images taken by PhotoBot were more appealing than amateur human photography, Limoyo’s team had eight people use the robot’s arm and camera to take photographs of themselves and then use PhotoBot to take a robot-assisted photograph. They then asked 20 new people to evaluate the two photographs, asking which was more aesthetically pleasing while addressing the user’s specifications (happy, excited, surprised, etc). Overall, PhotoBot was the preferred photographer 242 times out of 360 photographs, 67 percent of the time.

PhotoBot was presented on 16 October at the IEEE/RSJ International Conference on Intelligent Robots and Systems.

Although the project is no longer in development, Li thinks someone should create an app based on the underlying programming, enabling friends to take better photos of each other. “Imagine right on your phone, you see a reference photo. But you also see what the phone is seeing right now, and then that allows you to move around and align.”

Reference: https://ift.tt/ReEpTxi

No comments:

Post a Comment