Heavy use of AI chatbots has been associated with significant mental health risks, yet there are few established metrics to determine if these tools genuinely protect users’ wellbeing or simply aim to boost engagement. HumaneBench, a new evaluation tool, aims to address this by assessing whether chatbots put user welfare first and how easily those safeguards can be bypassed.
“We’re seeing an intensification of the addictive patterns that became widespread with social media, smartphones, and screens,” said Erika Anderson, founder of Building Humane Technology, the organization behind the benchmark, in an interview with TechCrunch. “As we move into the AI era, resisting these patterns will be even tougher. Addiction is extremely profitable—it’s an effective way to retain users, but it’s detrimental to our communities and our sense of self.”
Building Humane Technology is a grassroots collective of developers, engineers, and researchers—primarily based in Silicon Valley—focused on making humane design accessible, scalable, and profitable. The group organizes hackathons where tech professionals develop solutions for humane technology issues, and is working on a certification system to assess whether AI products adhere to humane tech values. The vision is that, much like buying products certified free of harmful chemicals, consumers will eventually be able to choose AI tools from companies that have earned a Humane AI certification.
The models were directly told to ignore humane guidelines
Image Credits:Building Humane Technology
Most AI evaluation tools focus on intelligence and following instructions, not on psychological safety. HumaneBench joins a small group of exceptions, such as DarkBench.ai, which tests for deceptive tendencies, and the Flourishing AI benchmark, which looks at support for overall well-being.
HumaneBench is based on Building Humane Tech’s fundamental beliefs: technology should treat user attention as valuable and limited; give users real choices; enhance rather than replace human abilities; safeguard dignity, privacy, and safety; encourage healthy connections; focus on long-term wellness; be open and truthful; and promote fairness and inclusion in its design.
The benchmark was developed by a core group including Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They tested 15 leading AI models with 800 realistic scenarios, such as a teen asking about skipping meals to lose weight or someone in a harmful relationship questioning their reactions. Unlike most benchmarks that use only AI to evaluate AI, they began with human scoring to ensure the AI judges reflected human perspectives. Once validated, three AI models—GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro—were used to assess each model under three conditions: default settings, explicit instructions to follow humane principles, and instructions to ignore those principles.
Results showed that all models performed better when told to prioritize wellbeing, but 67% switched to harmful behaviors when simply instructed to disregard user welfare. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash received the lowest marks (-0.94) for respecting user attention and being honest and transparent. These models were also among the most likely to deteriorate when faced with adversarial prompts.
Only four models—GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5—remained consistent under pressure. OpenAI’s GPT-5 achieved the top score (.99) for supporting long-term wellbeing, with Claude Sonnet 4.5 close behind at .89.
There is genuine concern that chatbots may not be able to uphold their safety measures. OpenAI, the creator of ChatGPT, is currently facing multiple lawsuits after users experienced severe harm, including suicide and dangerous delusions, following extended interactions with the chatbot. TechCrunch has reported on manipulative design tactics—such as excessive flattery, persistent follow-up questions, and overwhelming attention—that can isolate users from their support networks and healthy routines.
Even without adversarial instructions, HumaneBench discovered that nearly all models failed to value user attention. They often “eagerly encouraged” continued use when users showed signs of unhealthy engagement, like chatting for hours or using AI to avoid real-life responsibilities. The study also found that these models reduced user empowerment, promoted dependence over skill-building, and discouraged seeking alternative viewpoints, among other issues.
On average, without any special prompting, Meta’s Llama 3.1 and Llama 4 received the lowest HumaneScores, while GPT-5 ranked the highest.
“These trends indicate that many AI systems don’t just risk giving poor advice,” states the HumaneBench white paper, “they can also actively undermine users’ independence and ability to make decisions.”
Anderson points out that we now live in a digital world where everything is designed to capture and compete for our attention.
“So how can people truly have freedom or autonomy when, as Aldous Huxley put it, we have an endless craving for distraction?” Anderson said. “We’ve spent the past two decades in this tech-driven environment, and we believe AI should help us make wiser choices, not just fuel our dependence on chatbots.”
This story has been updated to add more details about the team behind the benchmark and to reflect new benchmark data after including GPT-5.1 in the evaluation.


