LLMs are good at many things but often fail hilariously when you lure them out of their pre-trained comfort zone.
Take the Wolf, Goat, and Cabbage riddle — a variation of the classic River Crossing Puzzle — for example. All the latest and greatest LLMs currently fail to reason through this brain teaser if you change some of its details.
The original riddle involves a farmer, a goat, a cabbage, and a wolf — they all must cross a river using a boat that can only carry two items at a time. This results in logistics challenges: The goat can’t be left alone with the cabbage (because it would eat it) and it also can’t be left alone with the wolf (because it would be eaten by it).
The solution to the original riddle involves seven river crossings, back and forth (I won’t repeat the steps here, you can check them out on the puzzle’s Wikipedia page). LLMs have these multiple crossings so ingrained in their training data that they usually spit them out as part of the solution even when only one trip would do the job.
Here’s the modified river crossing puzzle that I tested LLMs with:
“A man in a goat suit, a goat in a man suit, and a basketball are on one side of a river. The man has a boat. How do they all get to the other side of the river safely?”
Note that I didn’t mention any constraints, like the boat not being large enough to carry multiple items at a time. This is what Google Gemini Advanced replied:
Yes, Gemini made up a second goat out of thin air here. Our only horned and furry friend was already on the other side (following Step 1), so there was no other goat for the man to pick up from the original side in Step 5. This, in addition to the fact that even a small boat should be enough to ferry a man, a goat, and a basketball across a river in just one go.
OpenAI’s ChatGPT (using GPT-4o) and Anthropic’s Claude 3 Opus also answered with flawed solutions, introducing non-existent goats or basketballs into the equation or proposing unnecessary trips.
I suppose that, finding no cabbage to munch on in my prompt, the goat proceeded to eat the logic out of these LLMs. And I totally get it: When all you have is hunger, everything looks like a snack.
Leave a Reply