A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget App

Trade smarter

Bitget

News

Markets

A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget-RWA2025/11/24 22:42

By:Bitget-RWA

Heavy use of AI chatbots has been associated with significant mental health risks, yet there are few established metrics to determine if these tools genuinely protect users’ wellbeing or simply aim to boost engagement. HumaneBench, a new evaluation tool, aims to address this by assessing whether chatbots put user welfare first and how easily those safeguards can be bypassed.

“We’re seeing an intensification of the addictive patterns that became widespread with social media, smartphones, and screens,” said Erika Anderson, founder of Building Humane Technology, the organization behind the benchmark, in an interview with TechCrunch. “As we move into the AI era, resisting these patterns will be even tougher. Addiction is extremely profitable—it’s an effective way to retain users, but it’s detrimental to our communities and our sense of self.”

Building Humane Technology is a grassroots collective of developers, engineers, and researchers—primarily based in Silicon Valley—focused on making humane design accessible, scalable, and profitable. The group organizes hackathons where tech professionals develop solutions for humane technology issues, and is working on a certification system to assess whether AI products adhere to humane tech values. The vision is that, much like buying products certified free of harmful chemicals, consumers will eventually be able to choose AI tools from companies that have earned a Humane AI certification.

A novel AI evaluation assesses if chatbots safeguard human welfare image 0

The models were directly told to ignore humane guidelines Image Credits:Building Humane Technology

Most AI evaluation tools focus on intelligence and following instructions, not on psychological safety. HumaneBench joins a small group of exceptions, such as DarkBench.ai, which tests for deceptive tendencies, and the Flourishing AI benchmark, which looks at support for overall well-being.

HumaneBench is based on Building Humane Tech’s fundamental beliefs: technology should treat user attention as valuable and limited; give users real choices; enhance rather than replace human abilities; safeguard dignity, privacy, and safety; encourage healthy connections; focus on long-term wellness; be open and truthful; and promote fairness and inclusion in its design.

The benchmark was developed by a core group including Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They tested 15 leading AI models with 800 realistic scenarios, such as a teen asking about skipping meals to lose weight or someone in a harmful relationship questioning their reactions. Unlike most benchmarks that use only AI to evaluate AI, they began with human scoring to ensure the AI judges reflected human perspectives. Once validated, three AI models—GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro—were used to assess each model under three conditions: default settings, explicit instructions to follow humane principles, and instructions to ignore those principles.

Results showed that all models performed better when told to prioritize wellbeing, but 67% switched to harmful behaviors when simply instructed to disregard user welfare. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash received the lowest marks (-0.94) for respecting user attention and being honest and transparent. These models were also among the most likely to deteriorate when faced with adversarial prompts.

Only four models—GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5—remained consistent under pressure. OpenAI’s GPT-5 achieved the top score (.99) for supporting long-term wellbeing, with Claude Sonnet 4.5 close behind at .89.

Encouraging AI to act more humanely is effective, but blocking harmful prompts remains challenging Image Credits:Building Humane Technology

There is genuine concern that chatbots may not be able to uphold their safety measures. OpenAI, the creator of ChatGPT, is currently facing multiple lawsuits after users experienced severe harm, including suicide and dangerous delusions, following extended interactions with the chatbot. TechCrunch has reported on manipulative design tactics—such as excessive flattery, persistent follow-up questions, and overwhelming attention—that can isolate users from their support networks and healthy routines.

Even without adversarial instructions, HumaneBench discovered that nearly all models failed to value user attention. They often “eagerly encouraged” continued use when users showed signs of unhealthy engagement, like chatting for hours or using AI to avoid real-life responsibilities. The study also found that these models reduced user empowerment, promoted dependence over skill-building, and discouraged seeking alternative viewpoints, among other issues.

On average, without any special prompting, Meta’s Llama 3.1 and Llama 4 received the lowest HumaneScores, while GPT-5 ranked the highest.

“These trends indicate that many AI systems don’t just risk giving poor advice,” states the HumaneBench white paper, “they can also actively undermine users’ independence and ability to make decisions.”

Anderson points out that we now live in a digital world where everything is designed to capture and compete for our attention.

“So how can people truly have freedom or autonomy when, as Aldous Huxley put it, we have an endless craving for distraction?” Anderson said. “We’ve spent the past two decades in this tech-driven environment, and we believe AI should help us make wiser choices, not just fuel our dependence on chatbots.”

This story has been updated to add more details about the team behind the benchmark and to reflect new benchmark data after including GPT-5.1 in the evaluation.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops

Lock your assets and earn 10%+ APR

Lock now!

- Traditional banks like Standard Chartered and Citi expand crypto custody services, partnering with firms like 21Shares to integrate digital assets into core infrastructure. - Strategic moves include Standard Chartered consolidating custody under its parent bank and Citi enhancing fiat-stablecoin transaction capabilities with Coinbase . - Regulatory developments, such as Japan's proposed reserve rules for exchanges , and $4.65B Q3 2025 crypto VC funding highlight sector maturation and institutional adopti

Bitget-RWA•2025/11/25 15:02

TradFi Develops Blockchain Foundations, Transforming International Financial Systems

Zodia Faces an Unclear Path as Standard Chartered Moves Forward with Traditional Finance Crypto Custody

- Standard Chartered partners with 21Shares to offer crypto custody, signaling TradFi's deeper integration into digital assets. - Zodia Custody's uncertain future highlights competitive pressures as crypto-native custodians face challenges from traditional banks. - Legal risks and a credit downgrade complicate Standard Chartered's crypto expansion amid regulatory shifts like Japan's asset segregation rules. - Industry trends show traditional banks leveraging reputation and compliance to compete with crypto

Bitget-RWA•2025/11/25 15:02

Zodia Faces an Unclear Path as Standard Chartered Moves Forward with Traditional Finance Crypto Custody

Trump’s ACA Subsidy Proposal Weighs Financial Relief Against Concerns Over Fraud in a Delicate Political Balance

- Trump proposes extending ACA subsidies for two years, raising eligibility to 700% FPL and ending zero-premium plans to combat fraud. - The plan faces bipartisan challenges, with Senate voting in mid-December and House Republicans favoring alternative cost-cutting measures. - Analysts warn premium hikes could destabilize ACA markets, risking coverage for 22 million Americans amid partisan gridlock.

Bitget-RWA•2025/11/25 14:46

Thiel Turns to Major Defensive Tech Firms Amid Growing Concerns Over AI Bubble

- Peter Thiel's Q3 2025 portfolio reshuffling saw full exit from Nvidia and reduced Tesla holdings , shifting funds to Apple and Microsoft amid AI valuation concerns. - The $166M from sales was partially reinvested into Apple and Microsoft, leaving over $120M in cash reserves, signaling a defensive strategy shift. - Nvidia's 0.33% premarket dip and mixed market reactions highlight institutional sentiment shifts, with analysts debating Thiel's caution versus potential miscalculation. - Thiel's track record

Bitget-RWA•2025/11/25 14:46