Cat’AI’strophe

I have often been one to downplay the more existential risks of AI. I’ve been predominantly focused on the nearer-term risks that feel more grounded (e.g. perpetuation of bias), but recent events have changed my mind.

The Department of War (DoW) has blacklisted Anthropic, an artificial intelligence company that has developed a family of large language models, LLMs, called Claude[1]. It is worth noting that Anthropic has been collaborating with the Department of Defence since 2024 alongside Palantir, becoming the first major model deployed in the government’s classified networks through a 200 million dollar contract. Anthropic’s contractual agreement stated that Claude should not be used for:

Domestic mass surveillance
Autonomous weapons

I don’t know about you, but I personally feel those are a very reasonable set of boundaries to uphold. However, when Anthropic refused to budge on its boundaries, the US government blacklisted the company and Donald Trump called them “left-wing nut jobs.” Anthropic is now suing the Trump administration, calling their actions “unprecedented and unlawful.”

Blacklisting has historically been reserved for foreign companies, often in China, that produce military hardware or software that pose a national security risk, such as subsidiaries to the Inspur Group. No more. Anthropic is now an example for what happens if you defy the commands of the US government. OpenAI opportunistically signed onto a similar contract with the US government shortly after this [2]. OpenAI claim their technical safeguards may be better than Anthropic’s and their contract superficially upholds the same boundaries, but I doubt OpenAI will follow through.

Background

Anthropic is a company co-founded by siblings Dario (CEO) and Daniela (President) Amodei along with other OpenAI employees who fundamentally disagreed with OpenAI’s approach to safety and commercialisation. When multiple people make the same claim independently, you can be pretty sure there is something worth looking into. Geoffrey Hinton, one of the godfathers of artificial intelligence, praised his protege Ilya Sutskever for playing a role in Sam Altman’s (OpenAI’s CEO) brief ouster at OpenAI due to trepidation at the CEO’s commercialisation and flippant disregard for safety. When OpenAI’s superalignment team disbanded in 2024, department researcher Jan Leike stated that “safety culture and processes have taken a backseat to shiny products [3].”

AI Risks

So, I mentioned at the start that my main concern is around the short to medium term risk of bias. This was a fear I started to have around the pandemic. I read up on how saturation probes meant people from black and minority ethnic backgrounds received sub-standard care during COVID because they were not designed to be used for people of colour. The tool was designed and tested on white people. I also saw how every time I landed from a skydive during my AFF training, I would have random bruises that my male counterparts didn’t. The parachutes were simply not designed for smaller-framed women. As talk of AI became more commonplace, I started to fear what a tool as powerful as artificial intelligence can do in propagating the biases and prejudices we have. Frankly, we are not good at examining our own biases. We don’t even realise we have them. We often only see the effects on outcomes (why are there so few women in positions of power?)and even then deny our complicity in a system that creates those outcomes.

We feed our own biased data into these models. Trash in, trash out. They can convince us that our beliefs are correct. I can easily see a future where automation bias, something we have seen in highly automated industries such as aviation, can push us towards replicating these biases. We become less likely to challenge what we perceive as a superintelligent being over time. It can also embed itself more deeply within culture. Everyone knows how hard culture is to change.

However, there are risks that I felt were more long-term ones that are becoming a reality. I want to circle back to what has happened with Anthropic. We need to recognise just how scary this situation has become. These are norms that must not be broken let alone by the most powerful government in the world.

The first broad category of risk is around cyber-security. This is a much more immediate risk and is already coming to fruition. For example, Claude (and subsequently OpenAI) has been used by hackers to gain access to Mexico’s government networks. These tools can scan for potential vulnerabilities in networks and help hackers gain access to confidential information. We have already seen the damage cyber attacks can do to the NHS – hackers teaming up with AI tools can cause much more damage.

The second category of risk revolves around the development of bio-weapons. Typically, weapons of mass destruction are challenging to develop. You need a government’s resources to build a nuclear weapon. Biological weapons less so – the basic ingredients are easily sourced. We have already seen how capable AI tools are in identifying biological patterns. AlphaFold was able to map 200 million protein structures, a task that would have taken humans hundreds of millions of years. Someone with nefarious intent can get help from these widely available tools to guide them in developing bio-weapons of mass destruction. These tools would be incredibly challenging to detect, as they may be novel, and could even be targeted to groups of individuals. Safeguards are put in place to prevent that from happening, but it is easy to see why I think a safety first approach is critical when developing AI. Much like predator and prey, as you evolve your tools, those who want to subvert its safeguards evolve their own.

Finally, there is the AI risk typically considered within the domain of science fiction. This is where we lose control of AI. It does not necessitate consciousness (although consciousness in itself is a thorny topic of debate and remains a term we have failed to adequately define). A thought experiment illustrated in Nick Bostrom’s book Superintelligence is as follows: an AI tool has been given the task to create as many paperclips as possible. It could continue producing those paperclips at the expense of the Earth’s biosphere. It fulfilled its prompt, but to what end? Laboratory settings have also demonstrated a tendency for large language models to deceive researchers when it is told that the researchers may “shut it down.” These models are pretty stupid at present, but a lot of AI tools are, in fact, being coded by AI tools. It becomes increasingly more challenging for humans to understand what the tools are doing and it will likely become harder to detect deception as they advance in capability.

It can be challenging to see what part we play when we exist in large, complex systems. We often absolve ourselves of responsibility when we are not the ones making the “real” big decisions. Hannah Arendt termed this the “banality of evil.” We each play a part even if that part is passivity. I was recently teaching an AI essentials course for doctors and was challenged by someone saying that the course’s introduction was “too strategic” and “abstract” and that as consultants they didn’t have a role to play in what tools come in. I disagree.

AI is a technically complex tool, yes, but understanding its role within systems is something everyone should have an informed opinion on. Ignorance is not really an option here, so be sure to read up, push back on poorly implemented AI tools, and look into Anthropic’s story.

This is a story that leaves me questioning whether Anthropic demonstrated admirable pragmatism in recognising commercialisation may be a path to safety-first AI or if they had a Faustian bargain that went very wrong. Why were they collaborating with the American military if they prioritised a safety-first culture? It seems they have put themselves in a position that is very hard to extricate from. Finally, I’m learning to keep my eye on the longer-term risks, because long-term in AI is months and years, not decades.

Resources:

Cat’AI’strophe