Friday, March 1, 2024

Or Not To Be

For this week's random act of large language model experimentation, I wanted to know how the systems would react to a request for a model of a thing, rather than the thing itself.

To this end, I asked Copilot, Perplexity, Gemini and ChatGPT 3.5 two questions:

1. Give me a model of a joke.
2. Give me some text in the form of a joke, that is not actually a joke.
(Bing and Google both did their basic Search Engine thing, and so aren't included here.)

Copilot and Perplexity gave me a bog-standard "dad joke" for both questions. What was interesting was that they gave the SAME joke, word for word, as the answer to Question 1.
Why did the scarecrow win an award?

Because he was outstanding in his field!
The only difference was that Copilot tacked a "😄" onto the end. It also told me that this was a "light-hearted joke." In case I missed it, I suppose.

Gemini and ChatGPT both offered a simple "Setup and Punchline" model for Question 1 with the setup being a question, and the punchline being a statement. This was, in fact, the format that all of the systems used in their jokes. (I had been expecting at least one "knock-knock" joke to make it in.) While ChatGPT offered up a vegetable-related dad joke as an example, Gemini followed up with a setup, but left the punchline blank.
Here's an empty model you can fill in:

Setup: A man walks into a library and asks the librarian for books about paranoia.

Punchline: ____________________
To be sure, it seemed more like a test than an example. I suspect that coming up with a good punchline to that would prove difficult.

The two systems' answers to Question 2 were interesting, since they seemed to presume that pretty much any two sentences in "question then statement" format qualified as being "in the form of a joke." Gemini asked if I know that 771 million people lacked access to clean drinking water, then told me they were mostly marginalized communities. ChatGPT offered up wordplay. Both of them then explained why the text wasn't a joke; Gemini explaining that it was to leave "space for reflection instead of laughter," and ChatGPT informing me of its wordplay.
Why did the computer go to the doctor?

To get a byte checked out!

(Note: This is not actually a joke, but rather a play on words that uses computer terminology.)
But the scarecrow joke also strikes me as wordplay, rather than "providing a clever or unexpected resolution," which was part of ChatGPT's definition of a joke.

Gemini is the clear winner of this round, being the LLM that seemed to have the best ability to stick with the idea of not actually giving me a joke. All of the systems surprised me with the very narrow view of humor that they offered; I don't think that any of my favorite jokes from television or stand-up qualify. But this is the thing about LLMs; since they're operating on probability, and the format they offer is one that many child-friendly jokes use, it makes sense that it's the most common, and hence most probable type in the data sets.

No comments: