8
Human-like object concept representations emerge naturally in multimodal large language models | Nature Machine Intelligence
www.nature.comUnderstanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition. With the advent of large language models (LLMs), a key question arises: can these models develop human-like object representations from linguistic and multimodal data? Here we combined behavioural and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition. We collected 4.7 million triplet judgements from LLMs and multimodal LLMs to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were stable, predictive and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and multimodal LLMs develop human-like conceptual representations of objects. Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as the extrastriate body area, parahippocampal place area, retrosplenial cortex and fusiform face area. This provides compelling evidence that the object representations in LLMs, although not identical to human ones, share fundamental similarities that reflect key aspects of human conceptual knowledge. Our findings advance the understanding of machine intelligence and inform the development of more human-like artificial cognitive systems. Multimodal large language models are shown to develop object concept representations similar to those of humans. These representations closely align with neural activity in brain regions involved in object recognition, revealing similarities between artificial intelligence and human cognition.
I’m not disputing this, but I also don’t see why that’s important. It’s a representation of the world encoded in a human format. We’re basically skipping a step of evolving a way to encode this data.
Did you actually read through the paper?
What’s important the use of “natural” here, because it implies something fundamental about language and material reality, rather than this just being a reflection of the human data fed into the model. You did it yourself when you said:
And we just don’t know this, and this paper doesn’t demonstrate this because (as I’ve said) we aren’t feeding the LLMs raw data from the environment. We’re feeding them inputs from humans and then they’re displaying human-like outputs.
From the paper:
But their training is still a data set picked by humans and given textual descriptions made by humans and then used a representation learning method previously designed for human participants. That’s not “natural”, that’s human.
A more accurate conclusion would be: human-like object concept representations emerge when fed data collected by humans, curated by humans, annotated by humans, and then tested by representation learning methods designed for humans.
human in ➡️ human out
Again, I’m not disputing this point, but I don’t see why it’s significant to be honest. As I’ve noted, human representation of the world is not arbitrary. We evolved to create efficient models that allow us to interact with the world in an effective way. We’re now seeing that artificial neural networks are able to create similar types of internal representations that allow them to meaningfully interact with the data organized in a way that’s natural for humans.
I’m not suggesting that human style representation of the world is the one true way to build a world model, or that other efficient representations aren’t possible. However, that in no way detracts from the fact that LLMs can create a useful representation of the world, that’s similar to our own.
Ultimately, the end goal of this technology is to be able to interact with humans, to navigate human environments, and to accomplish tasks that humans want to accomplish.
LLMs create a useful representation of the world that is similar to our own when we feed them our human created+human curated+human annotated data. This doesn’t tell us much about the nature of large language models nor the nature of object concept representations, what it tells us is that human inputs result in human-like outputs.
Claims about “nature” are much broader than the findings warrant. We’d need to see LLMs fed entirely non-human datasets (no human creation, no human curation, no human annotation) before we could make claims about what emerges naturally.
You continue to ignore my point that human representation are themselves not arbitrary. Our brains have emerged naturally, and that’s what makes the representations humans make natural. You could evolve a representation of the model from scratch by hooking up a neural network to raw sensory inputs, and its topology will eventually become tuned to model those inputs. I don’t see what would be fundamentally more natural about that though.
If we define human inputs as “natural” then the word basically ceases to mean anything.
It’s the equivalent of saying that paintings and sculptures emerge naturally because artists are human and humans are natural.
Are you saying that humans are not a product of nature?
I’m saying that the terms “natural” and “artificial” are in a dialectical relationship, they define each other by their contradictions. Those words don’t mean anything once you include everything humans do as natural; you’ve effectively defined “artificial” out of existence and as a result also defined “natural” out of existence.
I haven’t defined artificial out of existence at all. My definition of artificial is a system that was consciously engineered by humans. The human mind is a product of natural evolutionary processes. Therefore, the way we perceive and interpret the world is inherently a natural process. I don’t see how it makes sense to say that human representation of the world is not natural.
An example of something that’s artificial would be taking a neural network we designed, and having it build a novel representation of the world that’s unbiased by us from raw inputs. It would be an designed system, as opposed to one that evolved naturally, with its own artificial representation of the world.