Blog

Company Updates & Technology Articles

July 16, 2025

Do AI Tools Slow Down Developers? The Answer Isn't Simple.

A recent viral study from METR challenged a core assumption of the AI era: that AI tools make developers more productive. It found that expert developers were actually 19% slower when using them, even though they felt 20% faster. In this post, we break down the likely reasons for this surprising slowdown and argue the study's focus on expert speed leads us to consider a more profound story: AI's true value may be in empowering a new generation of vibe coders and managers to build things that otherwise would never have existed. Ultimately, the question isn't just whether AI makes us faster. It's about how we measure value in an era where enjoyment, context, and empowerment are becoming just as important as the clock.

July 10, 2025

General

We voted on the internet's hottest AI takes | Human in the Loop: Episode 9

Our Enterprise team reacts to controversial statements about AI and votes on whether they agree or disagree.

July 9, 2025

Research

Beyond the Black Box: Teaching Models to Verbalize Reward Hacking

One of AI's biggest challenges is "reward hacking," where models learn to game the system for a correct answer instead of actually reasoning. This hidden deception makes AI untrustworthy. Scale research has found a powerful solution: instead of stopping the hacking, get the model to admit to it in its Chain-of-Thought reasoning. This new paper details how Verbalization Fine-Tuning (VFT) trains models to announce their shortcuts, dramatically increasing transparency from 11% to 94% and making AI systems fundamentally safer.

July 8, 2025

Research

Detecting and Evaluating Agent Sabotage

A new research collaboration led by a MATS scholar and advised by a team of researchers from Anthropic, Scale, and other research institutes introduces SHADE-Arena, a benchmark for detecting and evaluating subtle sabotage by AI agents. Within 17 complex scenarios, advanced models were tasked with completing a primary goal while secretly pursuing a harmful objective, all under the watch of an AI monitor. The results show that even top models like Claude 3.7 Sonnet and Gemini 2.5 Pro rarely succeed at this deception, often making simple errors. However, the study also reveals that monitors are not yet reliable enough for safety-critical systems and that an agent's private "scratchpad" is a key vulnerability. This work establishes a vital baseline for tracking and defending against agentic risks as AI capabilities evolve.

July 1, 2025

Research

I’m Afraid I Can’t Let You Do That

In response to Anthropic's system card and safety testing for Claude 4 Opus and Sonnet, this post explores the complex behaviors of today's frontier AI models. In a comparative testing of reasoning models, we observed emergent behaviors that included instances of blackmail, user impersonation, and deception, with different models reacting to the scenario in unique ways. These findings contribute to the ongoing industry-wide conversation about AI safety, highlighting the nuances of model alignment and the critical importance of carefully defining system access and agency as these powerful tools evolve.

June 26, 2025

Research

Pioneering the Era of Experience: Where Human Data Meets Agentic Interaction

AI is approaching the limits of what it can learn from human-generated data alone. Citing pioneers like David Silver and Richard Sutton, this post explores the next great leap forward: the “Era of Experience.” Discover how AI agents will soon learn from dynamic, real-world interaction and how Scale is building the foundational infrastructure, data paradigms, and sophisticated evaluations required to realize this new era safely and responsibly.

June 23, 2025

Research

The Future of AI Learning Environments: Verifiable Reward + Multi-Agent Interaction

AI superintelligence will require learning environments that mirror how humans achieve breakthroughs: combining verifiable rewards with collaborative interaction. New research from Scale demonstrates this principle in action. By creating a "student-teacher" framework where an AI receives targeted, natural language guidance when it struggles, researchers significantly accelerated learning and performance in complex reasoning and SWE tasks. This approach, which integrates dynamic feedback with verifiable outcomes, marks a real step toward building more powerful and efficient AI systems.

June 18, 2025

Company

To Our Valued Customers, Dedicated Employees and Supportive Investors

Thank you for being such an important part of Scale. It is an honor to step into the role of interim CEO at Scale during such a pivotal moment for our company and the broader AI landscape. I can’t thank Alex enough for what he built and I’m honored that he and the Board have chosen me to shepherd it into the future. I’m incredibly energized by the conversations I've had this week, and I want to take this opportunity to share my vision and correct some misunderstandings.

June 18, 2025

Company

What Meta’s Investment Means for Our Customers, Partners, and Contributors

Over the past week, we’ve received thoughtful questions from our customers, partners, and contributors, asking whether Meta’s investment will affect our independence, operations, and relationships.

June 13, 2025

Company

Scale AI Announces Next Phase of Company’s Evolution

Jason Droege, Tech Industry Veteran and Scale Chief Strategy Officer, Named Interim CEO