Re: Hinton – “We’re not reasoning machines. We’re analogy machines. We think by resonance, not deduction.”
Is humour the clearest proxy we have for analogy-driven generalization – and therefore a low-cost, early signal of AGI?
I sometimes make funny videos. I’ve tried using language models to help brainstorm. I’ve yet to see them generate a truly funny idea.
What do I mean by funny?
The core of high-quality humour is the discovery of a novel, previously unobserved connection between two things. Shared culture and delivery matter, of course – and there are other humour types with different mechanics – but I’m talking specifically about the kind of joke that makes it into a stand-up routine. You see it best in social commentary: Carlin, Chappelle, Gervais.
Why do we say things like “I’ve heard that one before” or “You stole that joke”?
Because the value of a joke is directly tied to its novelty. The first person that “discovered” it is somehow entitled to it. And to the person that heard it, it is only new once.
In that sense, jokes may be the simplest, most accessible form of new knowledge creation – not in the same class as a mathematical proof or a breakthrough scientific insight, but a kind of micro-discovery, nonetheless.
And this is why I suspect that humour correlates with generalization. Funny people aren’t just witty — they’re often highly perceptive or intelligent. There’s something about humour that seems adjacent to abstraction.
So – are there any benchmarks that try to measure this?
How might one design a benchmark that tests for the ability to generate novel, culturally resonant humour?
Maybe it’s unfair to expect language models to be funny – after all, they’re not optimized for it. And trying to train them explicitly on humour could be counterproductive.
But still, I suspect the current inability to generate truly novel humour isn’t just a training issue – it’s likely related to an inability to generalize.