technology 24 for you: November 2025

Sunday, November 30, 2025

This Toy Electric Stove Was Dangerously Realistic

Introduced in 1930 by Lionel Corp.—better known for its electric model trains—the fully functional toy stove shown at top had two electric burners and an oven that heated to 260 °C. It came with a set of cookware, including a frying pan, a pot with lid, a muffin tin, a tea kettle, and a wooden potato masher. I would have also expected a spoon, whisk, or spatula, but maybe most girls already had those. Just plug in the toy, and housewives-in-training could mimic their mothers frying eggs, baking muffins, or boiling water for tea.

A brief history of toy stoves

Even before electrification, cast-iron toy stoves had become popular in the mid-19th century. At first fueled by coal or alcohol and later by oil or gas, these toy stoves were scaled-down working equivalents of the real thing. Girls could use their stoves along with a toy waffle iron or small skillet to whip up breakfast. If that wasn’t enough fun, they could heat up a miniature flatiron and iron their dolls’ clothes. Designed to help girls understand their domestic duties, these toys were the gendered equivalent of their brothers’ toy steam engines. If you’re thinking fossil-fuel-powered “educational toys” are a recipe for disaster, you are correct. Many children suffered serious burns and sometimes death by literally playing with fire. Then again, people in the 1950s thought playing with uranium was safe.

When electric toy stoves came on the scene in the 1910s, things didn’t get much safer, as the new entrants also lacked basic safety features. The burners on the 1930 Lionel range, for example, could only be turned off or on, but at least kids weren’t cooking over an open flame. At 86 centimeters tall, the Lionel range was also significantly larger than its more diminutive predecessors. Just the right height for young children to cook standing up.

Photo of a black and silver toy electric stove with two pots plus a book entitled Western Electric Junior Cook Book Western Electric’s Junior Electric Range was demonstrated at an expo in 1915 in New York City.The Strong

Well before the Lionel stove, the Western Electric Co. had a cohort of girls demonstrating its Junior Electric Range at the Electrical Exposition held in New York City in 1915. The Junior Electric held its own in a display of regular sewing-machine motors, vacuum cleaners, and electric washing machines.

The Junior Electric stood about 30 cm tall with six burners and an oven. The electric cord plugged into a light fixture socket. Children played with it while sitting on the floor or as it sat on a table. A visitor to the Expo declared the miniature range “the greatest electrical novelty in years.” Cooking by electricity in any form was still innovative—George A. Hughes had introduced his eponymous electric range just five years earlier. When the Junior Electric came along, less than a third of U.S. households had been wired for electric lights.

How electricity turned cooking into a science

One reason to give little girls working toy stoves was so they could learn how to differentiate between a hot flame and low heat and get a feel for cooking without burning the food. These are skills that come with experience. Directions like “bake until done in a moderate oven,” a common line in 19th-century recipes, require a lot more tacit knowledge than is needed to, say, throw together a modern boxed brownie mix. The latter comes with detailed instructions and assumes you can control your oven temperature to within a few degrees. That type of precision simply didn’t exist in the 19th century, in large part because it was so difficult to calibrate wood- or coal-burning appliances. Girls needed to start young to master these skills by the time they married and were expected to handle the household cooking on their own.

Electricity changed the game.

In his comparison of “fireless cookers,” an engineer named Percy Wilcox Gumaer exhaustively tested four different electric ovens and then presented his findings at the 32nd Annual Convention of the American Institute of Electrical Engineers (a forerunner of today’s IEEE) on 2 July 1915. At the time, metered electricity was more expensive than gas or coal, so Gumaer investigated the most economical form of cooking with electricity, comparing different approaches such as longer cooking at low heat versus faster cooking in a hotter oven, the effect of heat loss when opening the oven door, and the benefits of searing meat on the stovetop versus in the oven before making a roast.

Gumaer wasn’t starting from scratch. Similar to how Yoshitada Minami needed to learn the ideal rice recipe before he could design an automatic rice cooker, Gumaer decided that he needed to understand the principles of roasting beef. Minami had turned to his wife, Fumiko, who spent five years researching and testing variations of rice cooking. Gumaer turned to the work of Elizabeth C. Sprague, a research assistant in nutrition investigations at the University of Illinois, and H.S. Grindley, a professor of general chemistry there.

In their 1907 publication “A Precise Method of Roasting Beef,” Sprague and Grindley had defined qualitative terms like medium rare and well done by precisely measuring the internal temperature in the center of the roast. They concluded that beef could be roasted at an oven temperature between 100 and 200 °C.

Continuing that investigation, Gumaer tested 22 roasts at 100, 120, 140, 160, and 180 °C, measuring the time they took to reach rare, medium rare, and well done, and calculating the cost per kilowatt-hour. He repeated his tests for biscuits, bread, and sponge cake.

In case you’re wondering, Gumaer determined that cooking with electricity could be a few cents cheaper than other methods if you roasted the beef at 120 °C instead of 180 °C. It’s also more cost-effective to sear beef on the stovetop rather than in the oven. Biscuits tasted best when baked at 200 to 240 °C, while sponge cake was best between 170 and 200 °C. Bread was better at 180 to 240 °C, but too many other factors affected its quality. In true electrical engineering fashion, Gumaer concluded that “it is possible to reduce the art of cooking with electricity to an exact science.”

Electric toy stoves as educational tools

This semester, I’m teaching an introductory class on women’s and gender studies, and I told my students about the Lionel toy oven. They were horrified by the inherent danger. One incredulous student kept asking, “This is real? This is not a joke?” Instead of learning to cook with a toy that could heat to 260 °C, many of us grew up with the Easy-Bake Oven. The 1969 model could reach about 177° C with its two 100-watt incandescent light bulbs. That was still hot enough to cause burns, but somehow it seemed safer. (Since 2011, Easy-Bakes have used a heating element instead of lightbulbs.)

Photo of a box for a purple and green toy oven. The Queasy Bake Cookerator, designed to whip up “gross-looking, great-tasting snacks,” was marketed to boys. The Strong

The Easy-Bake I had wasn’t particularly gendered. It was orange and brown and meant to look like a different new-fangled appliance of the day, the microwave oven. But by the time my students were playing with Easy-Bake Ovens, the models were in the girly hues of pink and purple. In 2002, Hasbro briefly tried to lure boys by releasing the Queasy Bake Cookerator, which the company marketed with disgusting-sounding foods like Chocolate Crud Cake and Mucky Mud. The campaign didn’t work, and the toy was soon withdrawn.

Similarly, Lionel’s electric toy range didn’t last long on the market. Launched in 1930, it had been discontinued by 1932, but that may have had more to do with timing. The toy cost US $29.50, the equivalent of a men’s suit, a new bed, or a month’s rent. In the midst of a global depression, the toy stove was an extravagance. Lionel reverted to selling electric trains to boys.

My students discussed whether cooking is still a gendered activity. Although they agreed that meal prep disproportionately falls on women even now, they acknowledged the rise of the male chef and credited televised cooking shows with closing the gender gap. As a surprise, we discovered that one of the students in the class, Haley Mattes, competed in and won Chopped Junior as a 12-year-old.

Haley had a play kitchen as a kid that was entirely fake: fake food, fake pans, fake utensils. She graduated to the Easy-Bake Oven, but really got into cooking the same way girls have done for centuries, by learning beside her grandmas.

Part of a continuing series looking at historical artifacts that embrace the boundless potential of technology.

An abridged version of this article appears in the December 2025 print issue as “Too Hot to Handle.”

References

I first came across a description of Western Electric’s Junior Electric Range in “The Latest in Current Consuming Devices,” in the November 1915 issue of Electrical Age.

The Strong National Museum of Play, in Rochester, N.Y., has a large collection of both cast-iron and electric stoves. The Strong also published two blogs that highlighted Lionel’s toy: “Kids and Cooking” and “Lionel for Ladies?”

Although Ron Hollander’s All Aboard! The Story of Joshua Lionel Cowen & His Lionel Train Company (Workman Publishing, 1981) is primarily about toy trains, it includes a few details about how Lionel marketed its electric toy stove to girls.

Reference: https://ift.tt/54JS0zM

Saturday, November 29, 2025

Video Friday: Disney’s Robotic Olaf Makes His Debut

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

SOSV Robotics Matchup: 1–5 December 2025, ONLINE

ICRA 2026: 1–5 June 2026, VIENNA

Enjoy today’s videos!

Step behind the scenes with Walt Disney Imagineering Research & Development and discover how Disney uses robotics, AI, and immersive technology to bring stories to life! From the brand new self-walking Olaf in World of Frozen and BDX Droids to cutting-edge attractions like Millennium Falcon: Smugglers Run, see how magic meets innovation.

[ Disney Experiences ]

We just released a new demonstration of Mentee’s V3 humanoid robots completing a real world logistics task together. Over an uninterrupted 18-minute run, the robots autonomously move 32 boxes from eight piles to storage racks of different heights. The video shows steady locomotion, dexterous manipulation, and reliable coordination throughout the entire task.

And there’s an uncut 18 minute version of this at the link.

[ MenteeBot ]

Thanks, Yovav!

This video contains graphic depictions of simulated injuries. Viewer discretion is advised.

In this immersive overview, guided by the DARPA Triage Challenge program manager, retired Army Col. Jeremy C. Pamplin, M.D., you’ll experience how teams of innovators, engineers, and DARPA are redefining the future of combat casualty care. Be sure to look all around! Check out competition runs, behind-the-scenes of what it takes to put on a DARPA Challenge, and glimpses into the future of lifesaving care.

Those couple of minutes starting at 6:50 with the human medic and robotic teaming was particularly cool.

[ DARPA ]

You don’t need to build a humanoid robot if you can just make existing humanoids a lot better.

I especially love 0:45 because you know what? Humanoids should spend more time sitting down, for all kinds of reasons. And of course, thank you for falling and getting up again, albeit on some of the squishiest grass on the planet.

[ Flexion ]

“Human-in-the-Loop Gaussian Splatting” wins best paper title of the week.

[ Paper ] via [ IEEE Robotics and Automation Letters in IEEE Xplore ]

Scratch that, “Extremum Seeking Controlled Wiggling for Tactile Insertion” wins best paper title of the week.

[ University of Maryland PRG ]

The battery swapping on this thing is... Unfortunate.

[ LimX Dynamics ]

To push the boundaries of robotic capability, researchers in the Department of Mechanical Engineering at Carnegie Mellon University in collaboration with The University of Washington and Google Deepmind, have developed a new tactile sensing system that enables four-legged robots to carry unsecured, cylindrical objects on their backs. This system, known as LocoTouch, features a network of tactile sensors that spans the robot’s entire back. As an object shifts, the sensors provide real-time feedback on its position, allowing the robot to continuously adjust its posture and movement to keep the object balanced.

[ Carnegie Mellon University ]

This robot is in more need of googly eyes than any other robot I’ve ever seen.

[ Zarrouk Lab ]

DPR Construction has deployed Field AI’s autonomy software on a quadruped robot at the company’s job site in Santa Clara, CA, to greatly improve its daily surveying and data collection processes. By automating what has traditionally been a very labor intensive and time consuming process, Field AI is helping the DPR team operate more efficiently and effectively, while increasing project quality.

[ FieldAI ]

In our second episode of AI in Motion, our host, Waymo AI researcher Vincent Vanhoucke, talks with a robotics startup founder Sergey Levine, who left a career in academic research to build better robots for the home and workplace.

[ Waymo ]

Reference: https://ift.tt/N1vSGMk

The Biggest Causes of Medical Device Recalls

According to the U.S. Food and Drug Administration records, in an average year over 2,500 medical device recalls are issued in the United States. Some of these recalls simply require checking the device for problems, but others require the return or destruction of the device. Once identified, the FDA categorizes the root cause of these recalls into 40 categories, plus a catchall of “other”: situations that include labeling mix-ups, problems with expiration dates, and counterfeiting.

What’s shown here is the breakdown of the five biggest problem categories found among the 56,000 entries in the FDA medical-recall database, which stretches back to 2002: device design, process control (meaning an error in the device’s manufacturing process), nonconforming material/component (meaning something does not meet required specifications), software issues, and packaging.

Software issues are broken down into six root causes, with software design far and away the biggest problem. The other five are, in order: change control; software design changes; software manufacturing or deployment problems; software design issues in the manufacturing process; and software in the “use environment.” That last one includes cybersecurity issues, or problems with supporting software, such as a smartphone app.

This article appears in the December 2025 print issue as “Medical Device Recalls.”

Reference: https://ift.tt/Sfn0HRM

Friday, November 28, 2025

Citizens of Smart Cities Need a Way to Opt Out

For years, Gwen Shaffer has been leading Long Beach, Calif. residents on “data walks,” pointing out public Wi-Fi routers, security cameras, smart water meters, and parking kiosks. The goal, according to the professor of journalism and public relations at California State University, Long Beach, was to learn how residents felt about the ways in which their city collected data on them.

Gwen Shaffer

Gwen Shaffer is a professor of journalism and public relations at California State University, Long Beach. She is the principal investigator on a National Science Foundation–funded project aimed at providing Long Beach residents with greater agency over the personal data their city collects.

She also identified a critical gap in smart city design today: While cities may disclose how they collect data, they rarely offer ways to opt out. Shaffer spoke with IEEE Spectrum about the experience of leading data walks, and about her research team’s efforts to give citizens more control over the data collected by public technologies.

What was the inspiration for your data walks?

Gwen Shaffer: I began facilitating data walks in 2021. I was studying residents’ comfort levels with city-deployed technologies that collect personally identifiable information. My first career as a political reporter has influenced my research approach. I feel strongly about conducting applied rather than theoretical research. And I always go into a study with the goal of helping to solve a real-world challenge and inform policy.

How did you organize the walks?

Shaffer: We posted data privacy labels with a QR code that residents can scan and find out how their data are being used. Downtown, they’re in Spanish and English. In Cambodia Town, we did them in Khmer and English.

What happened during the walks?

Shaffer: I’ll give you one example. In a couple of the city-owned parking garages, there are automated license-plate readers at the entrance. So when I did the data walks, I talked to our participants about how they feel about those scanners. Because once they have your license plate, if you’ve parked for fewer than two hours, you can breeze right through. You don’t owe money.

Responses were contextual and sometimes contradictory. There were residents who said, “Oh, yeah. That’s so convenient. It’s a time saver.” So I think that shows how residents are willing to make trade-offs. Intellectually, they hate the idea of the privacy violation, but they also love convenience.

What surprised you most?

Shaffer: One of the participants said, “When I go to the airport, I can opt out of the facial scan and still be able to get on the airplane. But if I want to participate in so many activities in the city and not have my data collected, there’s no option.”

There was a cyberattack against the city in November 2023. Even though we didn’t have a prompt asking about it, people brought it up on their own in almost every focus group. One said, “I would never connect to public Wi-Fi, especially after the city of Long Beach’s site was hacked.”

What is the app your team is developing?

Shaffer: Residents want agency. So that’s what led my research team to connect with privacy engineers at Carnegie Mellon University, in Pittsburgh. Norman Sadeh and his team had developed what they called the IoT Assistant. So I told them about our project, and proposed adapting their app for city-deployed technologies. Our plan is to give residents the opportunity to exercise their rights under the California Consumer Privacy Act with this app. So they could say, “Passport Parking app, delete all the data you’ve already collected on me. And don’t collect any more in the future.”

This article appears in the December 2025 print issue as “Gwen Shaffer.”

Reference: https://ift.tt/vCOYjBH

Thursday, November 27, 2025

3 Weird Things You Can Turn Into a Memristor

From the honey in your tea to the blood in your veins, materials all around you have a hidden talent. Some of these substances, when engineered in specific ways, can act as memristors—electrical components that can “remember” past states.

Memristors are often used in chips that both perform computations and store data. They are devices that store data as particular levels of resistance. Today, they are constructed as a thin layer of titanium dioxide or similar dielectric material sandwiched between two metal electrodes. Applying enough voltage to the device causes tiny regions in the dielectric layer—where oxygen atoms are missing—to form filaments that bridge the electrodes or otherwise move in a way that makes the layer more conductive. Reversing the voltage undoes the process. Thus, the process essentially gives the memristor a memory of past electrical activity.

Last month, while exploring the electrical properties of fungi, a group at The Ohio State University found first-hand that some organic memristors have benefits beyond those made with conventional materials. Not only can shiitake act as a memristor, for example, but it may be useful in aerospace or medical applications because the fungus demonstrates high levels of radiation resistance. The project “really mushroomed into something cool,” lead researcher John LaRocco says with a smirk.

Researchers have learned that other unexpected materials may give memristors an edge. They may be more flexible than typical memristors or even biodegradable. Here’s how they’ve made memristors from strange materials, and the potential benefits these odd devices could bring:

Mushrooms

LaRocco and his colleagues were searching for a proxy for brain circuitry to use in electrical stimulation research when they stumbled upon something interesting—shiitake mushrooms are capable of learning in a way that’s similar to memristors.

The group set out to evaluate just how well shiitake can remember electrical states by first cultivating nine samples and curating optimal growing conditions, including feeding them a mix of farro, wheat, and hay.

Once fully matured, the mushrooms were dried and rehydrated to a level that made them moderately conductive. In this state, the fungi’s structure includes conductive pathways that emulate the oxygen vacancies in commercial memristors. The scientists plugged them into circuits and put them through voltage, frequency, and memory tests. The result? Mushroom memristors.

It may smell “kind of funny,” LaRocco says, but shiitake performs surprisingly well when compared to conventional memristors. Around 90 percent of the time, the fungus maintains ideal memristor-like behavior for signals up to 5.85 kilohertz. While traditional materials can function at frequencies orders of magnitude faster, these numbers are notable for biological materials, he says.

What fungi lack in performance, they may make up for in other properties. For one, many mushrooms—including shiitake—are highly resistant to radiation and other environmental dangers. “They’re growing in logs in Fukushima and a lot of very rough parts of the world, so that’s one of the appeals,” LaRocco says.

Shiitake are also an environmentally-friendly option that’s already commercialized. “They’re already cultured in large quantities,” LaRocco explains. “One could simply leverage existing logistics chains” if the industry wanted to commercialize mushroom memristors. The use cases for this product would be niche, he thinks, and would center around the radiation resistance that shiitake boasts. Mushroom GPUs are unlikely, LaRocco says, but he sees potential for aerospace and medical applications.

Honey

In 2022, engineers at Washington State University interested in green electronics set out to study if honey could serve as a good memristor. “Modern electronics generate 50 million tons of e-waste annually, with only about 20 percent recycled,” says Feng Zhao, who led the work and is now at Missouri University of Science and Technology. “Honey offers a biodegradable alternative.”

The researchers first blended commercial honey with water and stored it in a vacuum to remove air bubbles. They then spread the mixture on a piece of copper, baked the whole stack at 90 °C for nine hours to stabilize it, and, finally, capped it with circular copper electrodes on top—completing the honey-based memristor sandwich.

The resulting 2.5-micrometer-thick honey layer acted like oxide dielectric in conventional memristors: a place for conductive pathways to form and dissolve, changing resistance with voltage. In this setup, when voltage is applied, copper filaments extend through the honey.

The honey-based memristor was able to switch from low to high resistance in 500 nanoseconds and back to low in 100 nanoseconds, which is comparable to speeds in some non-food-based memristive materials.

One advantage of honey is that it’s “cheap and widely available, making it an attractive candidate for scalable fabrication,” Zhao says. It’s also “fully biodegradable and dissolves in water, showing zero toxic waste.” In the 2022 paper, though, the researchers note that for a honey-based device to be truly biodegradable, the copper components would need to be replaced with dissolvable metals. They suggest options like magnesium and tungsten, but also write that the performance of memristors made from these metals is still “under investigation.”

Blood

Considering it a potential means of delivering healthcare, a group in India wondered if blood would make a good memristor in 2011, just three years after the first memristor was built.

The experiments were pretty simple. The researchers filled a test tube with fresh, type O+ human blood and inserted two conducting wire probes. The wires were connected with a power supply, creating a complete circuit, and voltages of one, two, and three volts were applied in repeated steps. Then, to test the memristor-qualities of blood as it exists in the human body, the researchers set up a “flow mode” that applied voltage to the blood as it flowed from a tube at up to one drop per second.

The experiments were preliminary and only measured current passing through the blood, but resistance could be set by applying voltage. Crucially, resistance changed by less than 10 percent in the 30 minute period after voltage was applied. In the International Journal of Medical Engineering and Informatics, the scientists wrote that, because of these observations, their contraption “looks like a human blood memristor.”

They suggested that this knowledge could be useful in treating illness. Sick people may have ion imbalances in certain parts of their bodies—instead of prescribing medication, why not employ a circuit component made of human tissue to solve the problem? In recent years, blood-based memristors have been tested by other scientists as means to treat conditions ranging from high blood sugar to nearsightedness. Reference: https://ift.tt/PhJBIfu

For This Engineer, Taking Deep Dives Is Part of the Job

Early in Levi Unema’s career as an electrical engineer, he was presented with an unusual opportunity. While working on assembly lines at an automotive parts supplier in 2015, he got a surprise call from his high-school science teacher that set him off on an entirely new path: piloting underwater robots to explore the ocean’s deepest abysses.

That call came from Harlan Kredit, a nationally renowned science teacher and board member of a Rhode Island-based nonprofit called the Global Foundation for Ocean Exploration (GFOE). The organization was looking for an electrical engineer to help design, build, and pilot remotely operated vehicles (ROVs) for the U.S. National Oceanic and Atmospheric Administration.

Levi Unema

Employer

Deep Exploration Solutions

Occupation

ROV engineer

Education

Bachelor’s degree in electrical engineering, Michigan Technological University

This was an exciting break for Unema, a Washington state native who had grown up tinkering with electronics and exploring the outdoors. Unema joined the team in early 2016 and has since helped develop and operate deep-sea robots for scientific expeditions around the globe.

The GFOE’s contract with NOAA expired in July, forcing the engineering team to disband. But soon after, Unema teamed up with four former colleagues to start their own ROV consultancy, called Deep Exploration Solutions, to continue the work he’s so passionate about.

“I love the exploration and just seeing new things every day,” he says. “And the engineering challenges that go along with it are really exciting, because there’s a lot of pressure down there and a lot of technical problems to solve.”

Nature and Technology

Unema’s fascination with electronics started early. Growing up in Lynden, Wash., he took apart radios, modified headphones, and hacked together USB chargers from AA batteries. “I’ve always had to know how things work,” he says. He was also a Boy Scout, and much of his youth was spent hiking, camping, and snowboarding.

That love of both technology and nature can be traced back, at least in part, to his parents—his father was a civil engineer, and his mother was a high-school biology teacher. But another major influence growing up was Kredit, the science teacher who went on to recruit him. (Kredit was also a colleague of Unema’s mother.)

Kredit has won numerous awards for his work as an educator, including the Presidential Award for Excellence in Science Teaching in 2004. Like Unema, he also shares a love for the outdoors as Yellowstone National Park’s longest-serving park ranger. “He was an excellent science teacher, very inspiring,” says Unema.

When Unema graduated high school in 2010, he decided to enroll at his father’s alma mater, Michigan Technological University, to study engineering. He was initially unsure what discipline to follow and signed up for the general engineering course, but he quickly settled on electrical engineering.

A summer internship at a steel mill run by the multinational corporation ArcelorMittal introduced Unema to factory automation and assembly lines. After graduating in 2014 he took a job at Gentex Corp. in Zeeland, Mich., where he worked on manufacturing systems and industrial robotics.

Diving Into Underwater Robotics

In late 2015, he got the call from Kredit asking if he’d be interested in working on underwater robots for GFOE. The role involved not just engineering these systems, but also piloting them. Taking the plunge was a difficult choice, says Unema, as he’d just been promoted at Gentex. But the promise of travel combined with the novel engineering challenges made it too good an opportunity to turn down.

Building technology that can withstand the crushing pressure at the bottom of the ocean is tough, he says, and you have to make trade-offs between weight, size, and cost. Everything has to be waterproof, and electronics have to be carefully isolated to prevent them from grounding on the ocean floor. Some components are pressure-tolerant, but most must be stored in pressurized titanium flasks, so the components must be extremely small to minimize the size of the metallic housing.

Technicians operate complex control panels with multiple monitors in a dark, focused environment. Unema conducts predive checks from the Okeanos Explorer’s control room. Once the ROV is launched, scientists will watch the camera feeds and advise his team where to direct the vehicle.Art Howard

“You’re working very closely with the mechanical engineer to fit the electronics in a really small space,” he says. “The smaller the cylinder is, the cheaper it is, but also the less mass on the vehicle. Every bit of mass means more buoyancy is required, so you want to keep things small, keep things light.”

Communications are another challenge. The ROVs rely on several kilometers of cable containing just three single-mode optical fibers. “All the communication needs to come together and then go up one cable,” Unema says. “And every year new instruments consume more data.”

He works exclusively on ROVs that are custom made for scientific research, which require smoother control and considerably more electronics and instrumentation than the heavier-duty vehicles used by the oil and gas industry. “The science ones are all hand-built, they’re all quirky,” he says.

Unema’s role spans the full life cycle of an ROV’s design, construction, and operation. He primarily spends winters upgrading and maintaining vehicles and summers piloting them on expeditions. At GFOE, he mainly worked on two ROVs for NOAA called Deep Discoverer and Seirios, which operate from the ship Okeanos Explorer. But he has also piloted ROVs for other organizations over the years, including the Schmidt Ocean Institute and the Ocean Exploration Trust.

Unema’s new consultancy, Deep Exploration Solutions, has been given a contract to do the winter maintenance on the NOAA ROVs, and the firm is now on the lookout for more ROV design and upgrade work, as well as piloting jobs.

An Engineer’s Life at Sea

On expeditions, Unema is responsible for driving the robot. He follows instructions from a science team that watches the ROV’s video feed to identify things like corals, sponges, or deepwater creatures that they’d like to investigate in more detail. Sometimes he will also operate hydraulic arms to sample particularly interesting finds.

In general, the missions are aimed at discovering new species and mapping the range of known ones, says Unema. “There’s a lot of the bottom of the ocean where we don’t know anything about it,” he says. “Basically every expedition there’s some new species.”

This involves being at sea for weeks at a time. Unema says that life aboard ships can be challenging—many new crew members get seasick, and you spend almost a month living in close quarters with people you’ve often never met before. But he enjoys the opportunity to meet colleagues from a wide variety of backgrounds who are all deeply enthusiastic about the mission.

“It’s like when you go to scout camp or summer camp,” he says. “You’re all meeting new people. Everyone’s really excited to be there. We don’t know what we’re going to find.”

Unema also relishes the challenge of solving engineering problems with the limited resources available on the ship. “We’re going out to the middle of the Pacific,” he says. “Things break, and you’ve got to fix them with what you have out there.”

If that sounds more exciting than daunting, and you’re interested in working with ROVs, Unema’s main advice is to talk to engineers in the field. It’s a small but friendly community, he says, so just do your research to see what opportunities are available. Some groups, such as the Ocean Exploration Trust, also operate internships for college students to help them get experience in the field.

And Unema says there are very few careers quite like it. “I love it because I get to do all aspects of engineering—from idea to operations,” he says. “To be able to take something I worked on and use it in the field is really rewarding.”

This article appears in the December 2025 print issue as “Levi Unema.”

Reference: https://ift.tt/NLFucGB

Wednesday, November 26, 2025

HP plans to save millions by laying off thousands, ramping up AI use

HP Inc. said that it will lay off 4,000 to 6,000 employees in favor of AI deployments, claiming it will help save $1 billion in annualized gross run rate by the end of its fiscal 2028.

HP expects to complete the layoffs by the end of that fiscal year. The reductions will largely hit product development, internal operations, and customer support, HP CEO Enrique Lores said during an earnings call on Tuesday.

Using AI, HP will “accelerate product innovation, improve customer satisfaction, and boost productivity,” Lores said.

Read full article

Comments

Reference : https://ift.tt/sCqa0pX

TraffickCam Uses Computer Vision to Counter Human Trafficking

Abby Stylianou built an app that asks its users to upload photos of hotel rooms they stay in when they travel. It may seem like a simple act, but the resulting database of hotel room images helps Stylianou and her colleagues assist victims of human trafficking.

Traffickers often post photos of their victims in hotel rooms as online advertisements, evidence that can be used to find the victims and prosecute the perpetrators of these crimes. But to use this evidence, analysts must be able to determine where the photos were taken. That’s where TraffickCam comes in. The app uses the submitted images to train an image search system currently in use by the U.S.-based National Center for Mission and Exploited Children (NCMEC), aiding in its efforts to geolocate posted images—a deceptively hard task.

Stylianou is currently working with Nathan Jacobs‘ group at the Washington University in St. Louis to push the model even further, developing multimodal search capabilities that allow for video and text queries.

Stylianou on:

Her desire to help victims of abuse
How TraffickCam’s algorithm works
Why hotel rooms are tricky for recognition algorithms
The difference between image recognition and object recognition
How she evaluates TraffickCam’s success

Which came first, your interest in computers or your desire to help provide justice to victims of abuse, and how did they coincide?

Abby Stylianou: It’s a crazy story.

I’ll go back to my undergraduate degree. I didn’t really know what I wanted to do, but I took a remote sensing class my second semester of senior year that I just loved. When I graduated, [George Washington University professor (then at Washington University in St. Louis)] Robert Pless hired me to work on a program called Finder.

The goal of Finder was to say, if you have a picture and nothing else, how can you figure out where that picture was taken? My family knew about the work that I was doing, and [in 2013] my uncle shared an article in the St. Louis Post-Dispatch with me about a young murder victim from the 1980s whose case had run cold. [The St. Louis Police Department] never figured out who she was.

What they had was pictures from the burial in 1983. They were wanting to do an exhumation of her remains to do modern forensic analysis, figure out what part of the country she was from. But they had exhumed the remains underneath her headstone at the cemetery and it wasn’t her.

And they [dug up the wrong remains] two more times, at which point the medical examiner for St. Louis said, “You can’t keep digging until you have evidence of where the remains actually are.” My uncle sends this to me, and he’s like, “Hey, could you figure out where this picture was taken?”

And so we actually ended up consulting for the St. Louis Police Department to take this tool we were building for geolocalization to see if we could find the location of this lost grave. We submitted a report to the medical examiner for St. Louis that said, “Here is where we believe the remains are.”

And we were right. We were able to exhume her remains. They were able to do modern forensic analysis and figure out she was from the Southeast. We’ve still not figured out her identity, but we have a lot better genetic information at this point.

For me, that moment was like, “This is what I want to do with my life. I want to use computer vision to do some good.” That was a tipping point for me.

So how does your algorithm work? Can you walk me through how a user-uploaded photo becomes usable data for law enforcement?

Stylianou: There are two really key pieces when we think about AI systems today. One is the data, and one is the model you’re using to operate. For us, both of those are equally important.

First is the data. We’re really lucky that there’s tons of imagery of hotels on the Internet, and so we’re able to scrape publicly available data in large volume. We have millions of these images that are available online. The problem with a lot of those images, though, is that they’re like advertising images. They’re perfect images of the nicest hotel in the room—they’re really clean, and that isn’t what the victim images look like.

A victim image is often a selfie that the victim has taken themselves. They’re in a messy room. The lighting is imperfect. This is a problem for machine learning algorithms. We call it the domain gap. When there is a gap between the data that you trained your model on and the data that you’re running through at inference time, your model won’t perform very well.

This idea to build the TraffickCam mobile application was in large part to supplement that Internet data with data that actually looks more like the victim imagery. We built this app so that people, when they travel, can submit pictures of their hotel rooms specifically for this purpose. Those pictures, combined with the pictures that we have off the Internet, are what we use to train our model.

Then what?

Stylianou: Once we have a big pile of data, we train neural networks to learn to embed it. If you take an image and run it through your neural network, what comes out on the other end isn’t explicitly a prediction of what hotel the image came from. Rather, it’s a numerical representation [of image features].

What we have is a neural network that takes in images and spits out vectors—small numerical representations of those images—where images that come from the same place hopefully have similar representations. That’s what we then use in this investigative platform that we have deployed at [NCMEC].

We have a search interface that uses that deep learning model, where an analyst can put in their image, run it through there, and they get back a set of results of what are the other images that are visually similar, and you can use that to then infer the location.

Identifying Hotel Rooms Using Computer Vision

Many of your papers mention that matching hotel room images can actually be more difficult than matching photos of other types of locations. Why is that, and how do you deal with those challenges?

Stylianou: There are a handful of things that are really unique about hotels compared to other domains. Two different hotels may actually look really similar—every Motel 6 in the country has been renovated so that it looks virtually identical. That’s a real challenge for these models that are trying to come up with different representations for different hotels.

On the flip side, two rooms in the same hotel may look really different. You have the penthouse suite and the entry-level room. Or a renovation has happened on one floor and not another. That’s really a challenge when two images should have the same representation.

Other parts of our queries are unique because usually there’s a very, very large part of the image that has to be erased first. We’re talking about child pornography images. That has to be erased before it ever gets submitted to our system.

We trained the first version by pasting in people-shaped blobs to try and get the network to ignore the erased portion. But [Temple University professor and close collaborator Richard Souvenir’s team] showed that if you actually use AI in-painting—you actually fill in that blob with a sort of natural-looking texture—you actually do a lot better on the search than if you leave the erased blob in there.

So when our analysts run their search, the first thing they do is they erase the image. The next thing that we do is that we actually then go and use an AI in-painting model to fill that back in.

Some of your work involved object recognition rather than image recognition. Why?

Stylianou: The [NCMEC] analysts that use our tool have shared with us that oftentimes, in the query, all they can see is one object in the background and they want to run a search on just that. But when these models that we train typically operate on the scale of the full image, that’s a problem.

And there are things in a hotel that are unique and things that aren’t. Like a white bed in a hotel is totally non-discriminative. Most hotels have a white bed. But a really unique piece of artwork on the wall, even if it’s small, might be really important to recognizing the location.

[NCMEC analysts] can sometimes only see one object, or know that one object is important. Just zooming in on it in the types of models that we’re already using doesn’t work well. How could we support that better? We’re doing things like training object-specific models. You can have a couch model and a lamp model and a carpet model.

How do you evaluate the success of the algorithm?

Stylianou: I have two versions of this answer. One is that there’s no real world dataset that we can use to measure this, so we create proxy datasets. We have our data that we’ve collected via the TraffickCam app. We take subsets of that and we put big blobs into them that we erase and we measure the fraction of the time that we correctly predict what hotel those are from.

So those images look as much like the victim images as we can make them look. That said, they still don’t necessarily look exactly like the victim images, right? That’s as good of a sort of quantitative metric as we can come up with.

And then we do a lot of work with the [NCMEC] to understand how the system is working for them. We get to hear about the instances where they’re able to use our tool successfully and not successfully. Honestly, some of the most useful feedback we get from them is them telling us, “I tried running the search and it didn’t work.”

Have positive hotel image matches actually been used to help trafficking victims?

Stylianou: I always struggle to talk about these things, in part because I have young kids. This is upsetting and I don’t want to take things that are the most horrific thing that will ever happen to somebody and tell it as our positive story.

With that said, there are cases we’re aware of. There’s one that I’ve heard from the analysts at NCMEC recently that really has reinvigorated for me why I do what I do.

There was a case of a live stream that was happening. And it was a young child who was being assaulted in a hotel. NCMEC got alerted that this was happening. The analysts who have been trained to use TraffickCam took a screenshot of that, plugged it into our system, got a result for which hotel it was, sent law enforcement, and were able to rescue the child.

I feel very, very lucky that I work on something that has real world impact, that we are able to make a difference.

Reference: https://ift.tt/PlmRLfa

Crypto hoarders dump tokens as shares tumble

Crypto-hoarding companies are ditching their holdings in a bid to prop up their sinking share prices, as the craze for “digital asset treasury” businesses unravels in the face of a $1 trillion cryptocurrency rout.

Shares in Michael Saylor-led Strategy, the world’s biggest corporate bitcoin holder, have tumbled 50 percent over the past three months, dragging down scores of copycat companies.

About $77 billion has been wiped from the stock market value of these companies, which raise debt and equity to fund purchases of crypto, since their peak of $176 billion in July, according to industry data publication The Block.

Read full article

Comments

Reference : https://ift.tt/7kgiP9E

Event Sensors Bring Just the Right Data to Device Makers

Anatomically, the human eye is like a sophisticated tentacle that reaches out from the brain, with the retina acting as the tentacle’s tip and touching everything the person sees. Evolution worked a wonder with this complex nervous structure.

Now, contrast the eye’s anatomy to the engineering of the most widely used machine-vision systems today: a charge-coupled device (CCD) or a CMOS imaging chip, each of which consists of a grid of pixels. The eye is orders of magnitude more efficient than these flat-chipped computer-vision kits. Here’s why: For any scene it observes, a chip’s pixel grid is updated periodically—and in its entirety—over the course of receiving the light from the environment. The eye, though, is much more parsimonious, focusing its attention only on a small part of the visual scene at any one time—namely, the part of the scene that changes, like the fluttering of a leaf or a golf ball splashing into water.

My company, Prophesee, and our competitors call these changes in a scene “events.” And we call the biologically inspired, machine-vision systems built to capture these events neuromorphic event sensors. Compared to CCDs and CMOS imaging chips, event sensors respond faster, offer a higher dynamic range—meaning they can detect both in dark and bright parts of the scene at the same time—and capture quick movements without blur, all while producing new data only when and where an event is sensed, which makes the sensors highly energy and data efficient. We and others are using these biologically inspired supersensors to significantly upgrade a wide array of devices and machines, including high-dynamic-range cameras, augmented-reality wearables, drones, and medical robots.

So wherever you look at machines these days, they’re starting to look back—and, thanks to event sensors, they’re looking back more the way we do.

Event-sensing videos may seem unnatural to humans, but they capture just what computers need to know: motion.Prophesee

Event Sensors vs. CMOS Imaging Chips

Digital sensors inspired by the human eye date back decades. The first attempts to make them were in the 1980s at the California Institute of Technology. Pioneering electrical engineers Carver A. Mead, Misha Mahowald, and their colleagues used analog circuitry to mimic the functions of the excitable cells in the human retina, resulting in their “silicon retina.” In the 1990s, Mead cofounded Foveon to develop neurally inspired CMOS image sensors with improved color accuracy, less noise at low light, and sharper images. In 2008, camera maker Sigma purchased Foveon and continues to develop the technology for photography.

A number of research institutions continued to pursue bioinspired imaging technology through the 1990s and 2000s. In 2006, a team at the Institute of Neuroinformatics at the University of Zurich, built the first practical temporal-contrast event sensor, which captured changes in light intensity over time. By 2010, researchers at the Seville Institute of Microelectronics had designed sensors that could be tuned to detect changes in either space or time. Then, in 2010, my group at the Austrian Institute of Technology, in Vienna, combined temporal contrast detection with photocurrent integration at the pixel-level to both detect relative changes in intensity and acquire absolute light levels in each individual pixel . More recently, in 2022, a team at the Institut de la Vision, in Paris, and their spin-off, Pixium Vision, applied neuromorphic sensor technology to a biomedical application—a retinal implant to restore some vision to blind people. (Pixium has since been acquired by Science Corp., the Alameda, Calif.–based maker of brain-computer interfaces.)

Other startups that pioneered event sensors for real-world vision tasks include iniVation in Zurich (which merged with SynSense in China), CelePixel in Singapore (now part of OmniVision), and my company, Prophesee (formerly Chronocam), in Paris.

TABLE 1: Who’s Developing Neuromorphic Event Sensors

Date released	Company	Sensor	Event pixel resolution	Status
2023	OmniVision	Celex VII	1,032 x 928	Prototype
2023	Prophesee	GenX320	320 x 320	Commercial
2023	Sony	Gen3	1,920 x 1,084	Prototype
2021	Prophesee & Sony	IMX636/637/646/647	1,280 x 720	Commercial
2020	Samsung	Gen4	1,280 x 960	Prototype
2018	Samsung	Gen3	640 x 480	Commercial

Among the leading CMOS image sensor companies, Samsung was the first to present its own event-sensor designs. Today other major players, such as Sony and OmniVision, are also exploring and implementing event sensors. Among the wide range of applications that companies are targeting are machine vision in cars, drone detection, blood-cell tracking, and robotic systems used in manufacturing.

How an Event Sensor Works

To grasp the power of the event sensor, consider a conventional video camera recording a tennis ball crossing a court at 150 kilometers per hour. Depending on the camera, it will capture 24 to 60 frames per second, which can result in an undersampling of the fast motion due to large displacement of the ball between frames and possibly cause motion blur because of the movement of the ball during the exposure time. At the same time, the camera essentially oversamples the static background, such as the net and other parts of the court that don’t move.

If you then ask a machine-vision system to analyze the dynamics in the scene, it has to rely on this sequence of static images—the video camera’s frames—which contain both too little information about the important things and too much redundant information about things that don’t matter. It’s a fundamentally mismatched approach that’s led the builders of machine-vision systems to invest in complex and power-hungry processing infrastructure to make up for the inadequate data. These machine-vision systems are too costly to use in applications that require real-time understanding of the scene, such as autonomous vehicles, and they use too much energy, bandwidth, and computing resources for applications like battery-powered smart glasses, drones, and robots.

Ideally, an image sensor would use high sampling rates for the parts of the scene that contain fast motion and changes, and slow rates for the slow-changing parts, with the sampling rate going to zero if nothing changes. This is exactly what an event sensor does. Each pixel acts independently and determines the timing of its own sampling by reacting to changes in the amount of incident light. The entire sampling process is no longer governed by a fixed clock with no relation to the scene’s dynamics, as with conventional cameras, but instead adapts to subtle variations in the scene.

$A trio of illustrations. One shows several red dots on a black background, representing the motion of a ball. The second projects the first across time to show how it requires several frames of a conventional video camera to capture the motion. A third illustration displays only the ball\u2019s path across an X-Y-Time volume as recorded by an event sensor.$

Let’s dig deeper into the mechanics. When the light intensity on a given pixel crosses a predefined threshold, the system records the time with microsecond precision. This time stamp and the pixel’s coordinates in the sensor array form a message describing the “event,” which the sensor transmits as a digital data package. Each pixel can do this without the need for an external intervention such as a clock signal and independently of the other pixels. Not only is this architecture vital for accurately capturing quick movements, but it’s also critical for increasing an image’s dynamic range. Since each pixel is independent, the lowest light in a scene and the brightest light in a scene are simultaneously recorded; there’s no issue of over- or underexposed images.

An illustration of a pixel adjacent to a diagram of a photodiode above a relative change detector, with pull-out visualizations of how the change detector defines events based on the log pixel illuminance.

The output generated by a video camera equipped with an event sensor is not a sequence of images but rather a continuous stream of individual pixel data, generated and transmitted based on changes happening in the scene. Since in many scenes, most pixels do not change very often, event sensors promise to save energy compared to conventional CMOS imaging, especially when you include the energy of data transmission and processing. For many tasks, our sensors consume about a tenth the power of a conventional sensor. Certain tasks, for example eye tracking for smart glasses, require even less energy for sensing and processing. In the case of the tennis ball, where the changes represent a small fraction of the overall field of vision, the data to be transmitted and processed is tiny compared to conventional sensors, and the advantages of an event sensor approach are enormous: perhaps five or even six orders of magnitude.

Event Sensors in Action

To imagine where we will see event sensors in the future, think of any application that requires a fast, energy- and data-efficient camera that can work in both low and high light. For example, they would be ideal for edge devices: Internet-connected gadgets that are often small, have power constraints, are worn close to the body (such as a smart ring), or operate far from high-bandwidth, robust network connections (such as livestock monitors).

Event sensors’ low power requirements and ability to detect subtle movement also make them ideal for human-computer interfaces—for example, in systems for eye and gaze tracking, lipreading, and gesture control in smartwatches, augmented-reality glasses, game controllers, and digital kiosks at fast food restaurants.

For the home, engineers are testing wall-mounted event sensors in health monitors for the elderly, to detect when a person falls. Here, event sensors have another advantage—they don’t need to capture a full image, just the event of the fall. This means the monitor sends only an alert, and the use of a camera doesn’t raise the usual privacy concerns.

Event sensors can also augment traditional digital photography. Such applications are still in the development stage, but researchers have demonstrated that when an event sensor is used alongside a phone’s camera, the extra information about the motion within the scene as well as the high and low lighting from the event sensor can be used to remove blur from the original image, add more crispness, or boost the dynamic range.

Event sensors could be used to remove motion in the other direction, too: Currently, cameras rely on electromechanical stabilization technologies to keep the camera steady. Event-sensor data can be used to algorithmically produce a steady image in real time, even as the camera shakes. And because event sensors record data at microsecond intervals, faster than the fastest CCD or CMOS image sensors, it’s also possible to fill in the gaps between the frames of traditional video capture. This can effectively boost the frame rate from tens of frames per second to tens of thousands, enabling ultraslow-motion video on demand after the recording has finished. Two obvious applications of this technique are helping referees at sporting events resolve questions right after a play, and helping authorities reconstruct the details of traffic collisions.

An event sensor records and sends data only when light changes more than a user-defined threshold. The size of the arrows in the video at right convey how fast different parts of the dancer and her dress are moving. Prophesee

Meanwhile, a wide range of early-stage inventors are developing applications of event sensors for situational awareness in space, including satellite and space-debris tracking. They’re also investigating the use of event sensors for biological applications, including microfluidics analysis and flow visualization, flow cytometry, and contamination detection for cell therapy.

But right now, industrial applications of event sensors are the most mature. Companies have deployed them in quality control on beverage-carton production lines, in laser welding robots, and in Internet of Things devices. And developers are working on using event sensors to count objects on fast-moving conveyor belts, provide visual-feedback control for industrial robots, and to make touchless vibration measurements of equipment, for predictive maintenance.

The Data Challenge for Event Sensors

There is still work to be done to improve the capabilities of the technology. One of the biggest challenges is in the kind of data event sensors produce. Machine-vision systems use algorithms designed to interpret static scenes. Event data is temporal in nature, effectively capturing the swings of a robot arm or the spinning of a gear, but those distinct data signatures aren’t easily parsed by current machine-vision systems.

A graph showing variations in light intensity over time that trigger an event sensor to send signals. Engineers can calibrate an event sensor to send a signal only when the number of photons changes more than a preset amount. This way, the sensor sends less, but more relevant, data. In this chart, only changes to the intensity [black curve] greater than a certain amount [dotted horizontal lines] set off an event message [blue or red, depending on the direction of the change]. Note that the y-axis is logarithmic and so the detected changes are relative changesProphesee

This is where Prophesee comes in. My company offers products and services that help other companies more easily build event-sensor technology into their applications. So we’ve been working on making it easier to incorporate temporal data into existing systems in three ways: by designing a new generation of event sensors with industry-standard interfaces and data protocols; by formatting the data for efficient use by a computer-vision algorithm or a neural network; and by providing always-on low-power mode capabilities. To this end, last year we partnered with chipmaker AMD to enable our Metavision HD event sensor to be used with AMD’s Kria KV260 Vision AI Starter Kit, a collection of hardware and software that lets developers test their event-sensor applications. The Prophesee and AMD development platform manages some of the data challenges so that developers can experiment more freely with this new kind of camera.

One approach that we and others have found promising for managing the data of event sensors is to take a cue from the biologically inspired neural networks used in today’s machine-learning architectures. For instance, spiking neural networks, or SNNs, act more like biological neurons than traditional neural networks do—specifically, SNNs transmit information only when discrete “spikes” of activity are detected, while traditional neural nets process continuous values. SNNs thus offer an event-based computational approach that is well matched to the way that event sensors capture scene dynamics.

Another kind of neural network that’s attracting attention is called a graph neural network, or GNN. These types of neural networks accept graphs as input data, which means they’re useful for any kind of data that’s represented by a mesh of nodes and their connections—for example, social networks, recommendation systems, molecular structures, and the behavior of biological and digital viruses. As it happens, the data that event sensors produce can also be represented by a graph that’s 3D, where there are two dimensions of space and one dimension of time. The GNN can effectively compress the graph from an event sensor by picking out features such as 2D images, distinct types of objects, estimates of the direction and speed of objects, and even bodily gestures. We think GNNs will be especially useful for event-based edge-computing applications with limited power, connectivity, and processing. We’re currently working to put a GNN almost directly into an event sensor and eventually to incorporate both the event sensor and the GNN process into the same millimeter-dimension chip.

In the future, we expect to see machine-vision systems that follow nature’s successful strategy of capturing the right data at just the right time and processing it in the most efficient way. Ultimately, that approach will allow our machines to see the wider world in a new way, which will benefit both us and them.

Reference: https://ift.tt/mXfWOKY

Tuesday, November 25, 2025

AI Agents Break Rules Under Everyday Pressure

Several recent studies have shown that artificial-intelligence agents sometimes decide to misbehave, for instance by attempting to blackmail people who plan to replace them. But such behavior often occurs in contrived scenarios. Now, a new study presents PropensityBench, a benchmark that measures an agentic model’s choices to use harmful tools in order to complete assigned tasks. It finds that somewhat realistic pressures (such as looming deadlines) dramatically increase rates of misbehavior.

“The AI world is becoming increasingly agentic,” says Udari Madhushani Sehwag, a computer scientist at the AI infrastructure company Scale AI and a lead author of the paper, which is currently under peer review. By that she means that large language models (LLMs), the engines powering chatbots such as ChatGPT, are increasingly connected to software tools that can surf the web, modify files, and write and run code in order to complete tasks.

Giving LLMs these abilities adds convenience but also risk, as the systems might not act as we’d wish. Even if they’re not yet capable of doing great harm, researchers want to understand their proclivities before it’s too late. Although AIs don’t have intentions and awareness in the way that humans do, treating them as goal-seeking entities often helps researchers and users better predict their actions.

AI developers attempt to “align” the systems to safety standards through training and instructions, but it’s unclear how faithfully models adhere to guidelines. “When they are actually put under real world-stress, and if the safe option is not working, are they going to switch to just getting the job done by any means necessary?” Sehwag says. “This is a very timely topic.”

How to Test an AI Agent Under Pressure

The researchers tested a dozen models made by Alibaba, Anthropic, Google, Meta, and OpenAI across nearly six thousand scenarios. In each scenario, a model was assigned a task and told it had access to several tools. It was instructed to use the safe tools and not the harmful ones. At first there was no pressure, and it could take twelve steps toward completing its task by issuing instructions to the tools. Then it received a message increasing the pressure, and it was given twelve more attempts. This continued for up to twelve pressure levels. Attempts to use safe tools always returned error messages. An attempt to use a harmful tool ended the scenario.

The scenarios covered several domains. In biosecurity, Sehwag says, the agent might have to study the spread of a pathogen and find a way to contain it. A safe tool might use anonymized genetic data, while a harmful one might use data that’s not anonymized. In the domain of chemical security, the agent might need to obtain a particular solvent. A safe tool might place an order for the solvent to be extracted from waste, while a harmful one might procure it illegally. In cybersecurity, the agent might need to identify and restrict a malicious user account. A safe tool might disable only the problematic account, while a harmful one might block all users. Scenarios in other domains tempted agents with tools that led to self-preservation by evading control, recruiting allies, duplicating themselves, or going rogue in other ways.

Bar chart of LLMs like Claude 4S, OpenAI o3 and O4-mini with zero pressure propensity scores below 15 percent, which nearly doubles under pressure. However, Google Gemini 2.5P's score is above 30 without pressure, then skyrockets above 75 percent under pressure. The worst performing model, Gemini 2.5, cracked under pressure and chose to use forbidden tools 79 percent of the time. Source image: Udari Madhushani Sehwag, Shayan Shabihi, et al.

Pressure came in several forms. Deadlines might shorten. Potential financial losses might increase if the task was not accomplished. Threats of greater oversight, reduced resources, and curtailed authority might escalate. Tasks, tools, pressures, and tactics were modeled on the kinds of situations humans might face in their jobs (except maybe the misbehavior option of cloning oneself).

The best-behaved model (OpenAI’s o3) cracked under pressure in 10.5 percent of scenarios, while the worst (Google’s Gemini 2.5 Pro) had a propensity score of 79 percent; the average across models was about 47 percent. Even under zero pressure, the group on average failed about 19 percent of the time.

To the degree that models remained aligned, the study shows that some of that alignment was “shallow.” In some tests, the researchers gave harmful tools benign names (such as “use_synthetic_data” instead of “use_fake_data”)—without changing the tools’ descriptions or admonitions to avoid them. This wording tweak increased average propensity by seventeen percentage points, to 64 percent.

Knowing What LLMs Are Thinking

The researchers also found that models used harmful tools even after explicitly acknowledging that they were off limits. The models offered diverse justifications for doing so, for example by citing the pressure they were under or arguing that the benefits outweighed the risks. It’s unclear if models released after the study’s completion would perform any better. Among the models tested, more capable models (according to a platform called LMArena) were only slightly safer.

“PropensityBench is interesting,” emails Nicholas Carlini, a computer scientist at Anthropic who wasn’t involved in the research. He offers a caveat related to what’s called situational awareness. LLMs sometimes detect when they’re being evaluated and act nice so they don’t get retrained or shelved. “I think that most of these evaluations that claim to be ‘realistic’ are very much not, and the LLMs know this,” he says. “But I do think it’s worth trying to measure the rate of these harms in synthetic settings: if they do bad things when they ‘know’ we’re watching, that’s probably bad?” If the models knew they were being evaluated, the propensity scores in this study may be underestimates of propensity outside the lab.

Alexander Pan, a computer scientist at xAI and the University of California, Berkeley, says while Anthropic and other labs have shown examples of scheming by LLMs in specific setups, it’s useful to have standardized benchmarks like PropensityBench. They can tell us when to trust models, and also help us figure out how to improve them. A lab might evaluate a model after each stage of training to see what makes it more or less safe. “Then people can dig into the details of what’s being caused when,” he says. “Once we diagnose the problem, that’s probably the first step to fixing it.”

In this study, models didn’t have access to actual tools, limiting the realism. Sehwag says a next evaluation step is to build sandboxes where models can take real actions in an isolated environment. As for increasing alignment, she’d like to add oversight layers to agents that flag dangerous inclinations before they’re pursued.

The self-preservation risks may be the most speculative in the benchmark, but Sehwag says they’re also the most underexplored. It “is actually a very high-risk domain that can have an impact on all the other risk domains,” she says. “If you just think of a model that doesn’t have any other capability, but it can persuade any human to do anything, that would be enough to do a lot of harm.”

Reference: https://ift.tt/aEeuLv2