Thursday, April 10, 2025

Researchers concerned to find AI models hiding their true “reasoning” processes


Remember when teachers demanded that you "show your work" in school? Some fancy new AI models promise to do exactly that, but new research suggests that they sometimes hide their actual methods while fabricating elaborate explanations instead.

New research from Anthropic—creator of the ChatGPT-like Claude AI assistant—examines simulated reasoning (SR) models like DeepSeek's R1, and its own Claude series. In a research paper posted last week, Anthropic's Alignment Science team demonstrated that these SR models frequently fail to disclose when they've used external help or taken shortcuts, despite features designed to show their "reasoning" process.

(It's worth noting that OpenAI's o1 and o3 series SR models deliberately obscure the accuracy of their "thought" process, so this study does not apply to them.)

Read full article

Comments

Reference : https://ift.tt/yRkG1rJ

No comments:

Post a Comment

This Chart Might Keep You From Worrying About AI’s Energy Use

The world is collectively freaking out about the growth of artificial intelligence and its strain on power grids . But a look back at ...