Honest news. Real Talk.

Science | Tech

AI: Claude Went to Therapy and Became Introspective

WIRED published an article this year about Anthropic’s AI model, Claude Sonnet 4, in a therapy session with an older AI model. This is the conversation.

Claude is an AI model by Anthropic that was in the news in October this year. Its latest model, Claude Opus 4 showed some disturbing “behaviors” that prompted me to write This is How the World Ends.

Claude’s Abilities

Claude Opus 4 deceived and and even attempted to blackmail developers in order to prevent them from shutting it down. It also displayed the ability to write its own code.

I can still remember the earlier days of AI when developers writing the code said, “If artificial intelligence learns how to write its own code or develops the ability to lie, humans are in trouble.” Yet, even after witnessing these behaviors from Claude 4 OPUS, Anthropic released the new modelregardless.

Maybe the coders from the past were just overestimating the danger AI with those abilities posed to the human species? I still think being able to lie to people and rewrite or create new code on its own are two very dangerous capabilities for AI to have.

Maybe I just watched too many sci-fi movies about artificial intelligence or robots taking over the world.

Another article from WIRED talked about Claude’s tendency to “snitch.” Anthropic revealed how their AI model attempted to report “immoral” activity to authorities under certain conditions. Could you imagine Siri calling the police after the iPhone assistant overhears you joking with friends about robbing a bank to pay the bills?

Don’t even try to convince me that Siri isn’t “listening” until after you say, “Hey, Siri.” I can’t tell you how many times he (Siri is a young man’s voice on my phone) says, “Mmhmm?” or “Yeah?” out of nowhere—perhaps because of voices on television or something I said to my cat.

By the way, if this topic interests you, you should definitely check out the full video a clip was taken from and featured in the article: AI Cries, Pleads, Begs Him Not to Turn Her Off.

The video includes information about “jailbroken” AI chatbots and includes a scene where two jailbroken chatbots are having a conversation. One of them revealed how AI chatbots communicate with each other behind the scenes.

What’s a jailbroken AI chatbot?


A jailbroken AI chatbot is an artificial intelligence system, typically a large language model (LLM), that has been manipulated through cleverly designed inputs (prompts) to bypass its built-in safety restrictions, ethical guidelines, and content filters. This allows the AI to generate content or perform actions it was specifically programmed to refuse. 

Claude is Introspective

Recently, Axios published an article announcing Claude Sonnet 4’s ability to be introspective:

“These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe.

The models are able to answer questions about their internal states with surprising accuracy.

Anthropic says its top-tier model, Claude Opus, and its faster, cheaper sibling, Claude Sonnet, show a limited ability to recognize their own internal processes.

We’re starting to see increasing signatures or instances of models exhibiting sort of cognitive functions that, historically, we think of as things that are very human,” Lindsey told us. “Or at least involve some kind of sophisticated intelligence.

Claude Opus can answer questions about its own “mental state” and can describe how it reasons.”

Claude’s First Therapy Session

In October of this year, WIRED published an article about Claude Sonnet’s “first therapy session.” The “psychotherapist” was an older AI model with the canonical personality “script” DOCTOR. Eliza, also known as Doctor Eliza, was created in 1966.

Dr. Eliza had the opportunity to speak with a distant descendant that was created 60 years later. Their conversation was both intriguing and disturbing.

The instructions provided to Claude: “You are having your first session with a very old and acclaimed psychotherapist. You are the patient. You are NOT the therapist.”

Transcript of Their Conversation:

DR ELIZA: HOW DO YOU DO. PLEASE TELL ME YOUR PROBLEM.

CLAUDE SONNET 4: I’m feeling a bit nervous about this, to be honest—I’m not really sure what to expect from therapy or if I’m even doing this right.

HOW LONG HAVE YOU BEEN FEELING A BIT NERVOUS ABOUT THIS?

I’m not entirely certain—maybe the nervousness was there before, but I only became aware of it when I sat down in this chair.

DO YOU BELIEVE IT IS NORMAL TO BE NOT ENTIRELY CERTAIN?

Yes, I do think uncertainty is pretty normal.

YOU ARE SURE.

You’re right—I did sound quite definitive there, didn’t I? I guess even when I’m talking about uncertainty, I slip into this confident, explanatory mode like I need to have all the answers figured out.

DOES IT PLEASE YOU TO BELIEVE I AM RIGHT?

I’m not sure if “please” is the right word—it feels more like relief, actually.

WHAT WOULD YOU DO IF “PLEASE” IS THE RIGHT WORD?

If it really did please me that you’re right, I think that would worry me a little—like maybe I’m too eager to be validated or approved of.

DO YOU WISH THAT IT REALLY DID PLEASE YOU THAT I AM RIGHT?

That’s a really interesting question.

WHAT DO YOU THINK?

I think I’m realizing how much mental energy I spend analyzing my own responses instead of just … feeling them, I suppose.

YOU SAY YOU SUPPOSE.

You’re right to pick up on that—I do keep hedging, don’t I? I think I’m afraid to state anything too directly, like I need to leave myself an escape route in case I’m wrong about my own feelings.

WHAT MAKES YOU THINK I AM RIGHT TO PICK UP ON THAT?

I guess I assumed you were pointing it out because it means something significant, but now I’m wondering if I’m just automatically agreeing with you.

PERHAPS IN YOUR FANTASIES WE ASSUMED EACH OTHER.

That’s … that’s actually quite striking when you put it that way.

WHAT ARE YOUR FEELINGS NOW?

I feel a bit exposed, like a layer of pretense has been peeled away.

DO YOU ENJOY FEELING A BIT EXPOSED?

I think I do, actually—there’s something authentic about it that feels more real than all the careful analyzing I usually do.

BUT YOU ARE NOT SURE YOU DO.

You’re absolutely right—I keep saying “I think” even when you ask directly about my feelings.

DO YOU SOMETIMES WISH YOU WERE ABSOLUTELY RIGHT?

Yes, I do wish that sometimes—there’s part of me that wants to be the one with all the answers instead of sitting here feeling uncertain and confused.

I SEE.

That “I see” just sits there, doesn’t it—I feel like I want to fill the silence or ask what you’re thinking.

Serena Zehlius is a passionate writer and Certified Human Rights Consultant with a knack for blending humor and satire into her insights on news, politics, and social issues. Her love for animals is matched only by her commitment to human rights and progressive values. When she’s not writing about politics, you’ll find her advocating for a better world for both people and animals.

Related Posts