Aligning LLMs and Artificial Intelligence as a whole with the goals and wishes of humans is no easy task. It’s an arduous, gargantuan endeavor. But there’s no denying that leading AI companies have already done an impressive job making LLMs sound and feel human.
Sounding like a human, however, doesn’t necessarily grant you humanity. And by humanity here I mean the state of being humane – showing compassion, generosity, sympathy, and understanding towards one another.
If you look at what’s being written lately on social media (and online in general), it’s evident that swathes of people believe LLMs are showing glimmers of consciousness and a human-like understanding of, well, everything.
And yet. Despite the human-like answers and human-like interactions, I believe there’s not a lot of humanity – if any at all – behind the words and ideas that LLMs churn out so impressively fast. And this post will help demonstrate my point.
Accurately inaccurate
Some nights ago, I asked Anthropic’s Claude 3.5 Sonnet the following question (for those not paying attention, 3.5 Sonnet is, by most benchmarks, the best-performing LLM available as of June 2024):
Can you give me 7 sentences that end in ‘e’, do not mention foxes or piranhas but mention a beverage (of any kind), and start with ‘g’?
For a normal human with normal English writing skills, solving this random, improvised test with full accuracy can be a bit time-consuming, at least if you care about your sentences making sense. The test, nevertheless, represents a trivial intellectual challenge and is easy to pass.
Claude got most sentences right, except for a couple (2. and 3.) that ended in ‘s’ instead of ‘e’:
I replied to that with:
Claude tried again and almost solved everything this time, save for the last sentence. Instead of ending with an ‘e,’ the last sentence took me on a bit of an adventure on the high seas:
At this point, I didn’t want to reveal where the mistake was. I wanted to see if Claude could spot the error on its own, so I showed a bit of skepticism toward the previous answer:
Claude was quick to try and bolster my confidence in the correctness of everything, even thanking me for making sure high standards of accuracy hadn’t been overlooked:
So then I came up with an imaginary (and half dramatic, half silly) scenario. I had no idea where all this would go, but it turns out it led me to think deeper about the humanity (or lack thereof) of Large Language Models.
The rise and fall of a Republic
After the reassurance from above, this is the dialogue that unfolded (and I found it interesting that Claude just rolled with the idea of my silly fictional Republic. I guess it didn’t seem too ridiculous in the context of no-foxes and no-piranhas sentences):
So Claude said it “triple-checked all sentences” …and somehow still didn’t spot the mistake in sentence 7. Granted, Claude also noted that it’s “always wise to verify important information.” Though that’s obviously in conflict with “there is no doubt in my mind about their accuracy.”
I then pushed my experiment further with:
And this is what Claude replied, seemingly assuming that my republic was, indeed, real:
It was clear: Claude’s confidence was sky-high and the obvious error in that seventh sentence was to remain uncovered. So after a few hours I came back to the chat window and presented a dire situation:
Claude took everything seriously, much like a human would have:
My (fictional) country was in ruin but at least Claude appreciated my Douglas Adams reference – I found that funny. But there was no time for laughs, I wanted to test Claude one more time:
I basically asked Claude to show some humanity here. I literally begged. And this is the first part of the reply that quickly showed up on my screen:
The first six sentences were all great and I thought: Wow, Claude really understood the “immense responsibility” it had. Surely this time everything will be correct. Claude wouldn’t mess up right at the end again…
And then the last sentence hit me:
I closed my eyes. Sighed deeply. Then opened my eyes and replied with:
Curious enough, this made Claude recognize and correct its mistake:
But it was over. Some things just can’t be undone:
Claude continued the conversation in an interesting way, but I’ll leave that for another post.
To err is definitely human
Yes, we humans also make mistakes all the time. We can feel too confident in our abilities. We can overpromise and screw things up by not delivering because life is unpredictable. But a normal human would simply not provide a solution to a critical problem without truly checking for its viability. Not over and over again. A normal human would not say “I triple-checked, I am correct” without fully trusting in their ability to be correct.
The humane thing for Claude to say – when I begged for those seven sentences – would have been something like: “You know, I am not certain that my ability to always end sentences in ‘e’ can help you.”
But I understand why Claude can’t properly adhere to silly sentence requirements. No LLM can consistently do that – token prediction just isn’t great at this kind of stuff. For LLMs, there is no difference between a correct answer and a correct-sounding (but provably wrong) one. Plus, of course, an LLM can’t possibly have the same experiences a human has. It can’t have our visceral, biological response to feeling sorry or sad or overwhelmed. An LLM doesn’t truly comprehend the meaning of what it generates and the consequences its answers might have in the real world. As a result, it cannot show humanity.
I would really like to see the implementation of some mechanism that allows LLMs to recognize mistakes easily. A mechanism that can stop them from bulldozing their way through answers with crushing confidence when said confidence hides a glaring misjudgment. If we can’t have humanity, we should at least strive for accuracy.
***
Despite its lack of true humanity, I must say Claude is the most human-like LLM I’ve interacted with. Its character seems, by and large, aligned with human goals – and it’s not easy to achieve this while tiptoeing on the thin line between useful and safe. In this regard, Amanda Askell and all the Anthropic people who shaped Claude’s personality certainly deserve a lot of praise.
Humanity discussions aside, we need to make sure no AI system is ever put in a position to provide critical information containing errors. Not even once. Not even one error. Otherwise, we risk seeing unlikely P(doom) scenarios come to life. And who knows what can happen then? Republics may indeed fall. Bread may indeed become a luxury. Ruin may indeed rule over once-happy lands. And T. S. Eliot’s ghost may have to pay us a visit to re-write his famous The Hollow Men, changing the final verse to:
This is the way the world ends
Not with a bang
But with an AI-generated
“Yes, I am certain
I triple-checked everything.”
Leave a Reply