Interesting and very spooky to think criminals might get access to the power of AI!
Demis Hassabis worries that artificial intelligence could be catastrophic for humanity. He also runs 1 of the world’s leading AI labs, Google DeepMind, pushing to build smarter, faster and more powerful systems as quickly as possible.
Artificial intelligence has followed a simple rule: make it bigger with more layers, more connections, more computing power. However, a new study suggests otherwise. Instead of scaling up, the study authors built something incredibly small—a quantum system with just nine interacting atomic spins—and asked it to take on problems that usually demand far larger machines. The result was unexpected. This tiny system didn’t just hold its ground; it outperformed classical machine-learning models with thousands of nodes in tasks like predicting temperature patterns over several days. “This represents the 1st experimental demonstration of quantum machine learning outperforming large-scale classical models on real-world tasks,” the study authors note.
Anthropic accidentally leaked the existence of what the company said was its most powerful artificial intelligence to date: a new model, known as Claude Mythos Preview, that represented “a step change” in AI performance. In particular, according to a blog post that leaked due to human error and a misconfigured content management system, Mythos posed serious new risks to cybersecurity. “It presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders,” the blog post stated.
Anthropic announced Mythos alongside Project Glasswing, an initiative with more than 40 of the world’s biggest tech companies that will see Anthropic grant early access to the model to find and patch vulnerabilities across many of the world’s most important systems. Launch partners in the coalition include Apple, Google, Microsoft, Cisco and Broadcom.
But this marks a striking and mostly unsettling moment in the development of AI systems. 1 of the world’s 3 frontier labs has now created a model it says is too dangerous to release to the general public. These dangers emerged not from any specialized cyber training but from the same general improvements that every other lab is currently pursuing. As a result, models with similar capabilities may soon be accessible to criminals, hackers, and nation states — or even more broadly via open source models. Anthropic said the model has found thousands of high-severity vulnerabilities in every major operating system and web browser, and in many cases developed related exploits. Among them: a vulnerability in OpenBSD, a security-focused open source operating system, that had escaped detection for 27 years; another flaw in the video encoder FFmpeg that had escaped detection in 5 million previous automated tests; and “several” vulnerabilities in the Linux kernel, which could be exploited to take complete control of a user’s machine.
“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely,” the company wrote. “The fallout — for economies, public safety, and national security — could be severe.
The good news is that Anthropic discovered in the process of developing Claude Mythos that the A.I. could not only write software code more easily and with greater complexity than any model currently available, but as a byproduct of that capability, it could also find vulnerabilities in virtually all of the world’s most popular software systems more easily than before. The bad news is that if this tool falls into the hands of bad actors, they could hack pretty much every major software system in the world. Any bets?
Anthropic suggests its AI models have digital representations of human emotions like happiness, sadness, joy, and fear, within clusters of artificial neurons—and these representations activate in response to different cues. Researchers at the company probed the inner workings of Claude Sonnet 4.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions. When Claude says it is happy to see you, for example, a state inside the model that corresponds to “happiness” may be activated. And Claude may then be a little more inclined to say something cheery or put extra effort into vibe coding. “What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions,” says Jack Lindsey, a researcher at Anthropic who studies Claude’s artificial neurons.
Artificial intelligence has followed a simple rule: make it bigger with more layers, more connections, more computing power. However, a new study suggests otherwise. Instead of scaling up, the study authors built something incredibly small—a quantum system with just nine interacting atomic spins—and asked it to take on problems that usually demand far larger machines. The result was unexpected. This tiny system didn’t just hold its ground; it outperformed classical machine-learning models with thousands of nodes in tasks like predicting temperature patterns over several days. “This represents the 1st experimental demonstration of quantum machine learning outperforming large-scale classical models on real-world tasks,” the study authors note.
Anthropic accidentally leaked the existence of what the company said was its most powerful artificial intelligence to date: a new model, known as Claude Mythos Preview, that represented “a step change” in AI performance. In particular, according to a blog post that leaked due to human error and a misconfigured content management system, Mythos posed serious new risks to cybersecurity. “It presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders,” the blog post stated.
Anthropic announced Mythos alongside Project Glasswing, an initiative with more than 40 of the world’s biggest tech companies that will see Anthropic grant early access to the model to find and patch vulnerabilities across many of the world’s most important systems. Launch partners in the coalition include Apple, Google, Microsoft, Cisco and Broadcom.
But this marks a striking and mostly unsettling moment in the development of AI systems. 1 of the world’s 3 frontier labs has now created a model it says is too dangerous to release to the general public. These dangers emerged not from any specialized cyber training but from the same general improvements that every other lab is currently pursuing. As a result, models with similar capabilities may soon be accessible to criminals, hackers, and nation states — or even more broadly via open source models. Anthropic said the model has found thousands of high-severity vulnerabilities in every major operating system and web browser, and in many cases developed related exploits. Among them: a vulnerability in OpenBSD, a security-focused open source operating system, that had escaped detection for 27 years; another flaw in the video encoder FFmpeg that had escaped detection in 5 million previous automated tests; and “several” vulnerabilities in the Linux kernel, which could be exploited to take complete control of a user’s machine.
“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely,” the company wrote. “The fallout — for economies, public safety, and national security — could be severe.
The good news is that Anthropic discovered in the process of developing Claude Mythos that the A.I. could not only write software code more easily and with greater complexity than any model currently available, but as a byproduct of that capability, it could also find vulnerabilities in virtually all of the world’s most popular software systems more easily than before. The bad news is that if this tool falls into the hands of bad actors, they could hack pretty much every major software system in the world. Any bets?
Anthropic suggests its AI models have digital representations of human emotions like happiness, sadness, joy, and fear, within clusters of artificial neurons—and these representations activate in response to different cues. Researchers at the company probed the inner workings of Claude Sonnet 4.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions. When Claude says it is happy to see you, for example, a state inside the model that corresponds to “happiness” may be activated. And Claude may then be a little more inclined to say something cheery or put extra effort into vibe coding. “What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions,” says Jack Lindsey, a researcher at Anthropic who studies Claude’s artificial neurons.
Replies