Modified: April 29, 2022
grounded
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.A nice observation from Percy Liang on the relationship between language modeling and grounded understanding:
Just because you don't observe something doesn't mean you can't infer anything about it. A Gaussian mixture model is defined over (cluster label, point). Even if you only observe the points, unsupervised learning can still infer the cluster labels up to permutation. Have you solved the problem completely? No. But given a label of one point per cluster, then you're done. Saying that unsupervised learning doesn't fully solve the problem is missing the point that it nearly solves the problem! This is a toy example, and how much you can infer about the world (analogue to the cluster labels) from text (analogue to the points) alone is not clear to me and of course will depend on the actual training data and model.
We could see results demonstrating unsupervised machine translation as practical validation of the idea that unsupervised learning can capture a lot of 'meaning', since we can learn to translate between two languages without any parallel training data to 'ground' the translation. It turns out that co-occurance patterns within each language are enough to identify concepts. The same ideas have been applied to unsupervised image captioning, eg, https://arxiv.org/abs/1811.10787.
This isn't how humans learn language; we do get quite a bit of grounding as we go. And obviously that helps. But the fact that it's not strictly necessary seems notable and even profound.