Max Kozlov

Nature
A non-invasive imaging technique can translate scenes in your head into sentences. It could help to reveal how the brain interprets the world.

Functional magnetic resonance imaging is a non-invasive way to explore brain activity, National Institute of Mental Health/National Institutes of Health/SPL

 

Reading a person’s mind using a recording of their brain activity sounds futuristic, but it’s now one step closer to reality. A technique called ‘mind captioning’ generates descriptive sentences of what a person is seeing or picturing in their mind using a read-out of their brain activity, with impressive accuracy.

The technique, described in a paper published today in Science Advances1, also offers clues for how the brain represents the world before thoughts are put into words. And it might be able to help people with language difficulties, such as those caused by strokes, to better communicate.

The model predicts what a person is looking at “with a lot of detail”, says Alex Huth, a computational neuroscientist at the University of California, Berkeley. “This is hard to do. It’s surprising you can get that much detail.”

Scan and predict

Researchers have been able to accurately predict what a person is seeing or hearing using their brain activity for more than a decade. But decoding the brain’s interpretation of complex content, such as short videos or abstract shapes, has proved more difficult.

Previous attempts have identified only key words that describe what a person saw rather than the complete context, which might include the subject of a video and actions that occur in it, says Tomoyasu Horikawa, a computational neuroscientist at NTT Communication Science Laboratories in Kanagawa, Japan. Other attempts have used artificial intelligence (AI) models that can create sentence structure themselves, making it difficult to know whether the description was actually represented in the brain, he adds.

Horikawa’s method first used a deep-language AI model to analyse the text captions of more than 2,000 videos, turning each one into a unique numerical ‘meaning signature’. A separate AI tool was then trained on six participants’ brain scans and learnt to find the brain-activity patterns that matched each meaning signature while the participants watched the videos.

Once trained, this brain decoder could read a new brain scan from a person watching a video and predict the meaning signature. Then, a different AI text generator would search for a sentence that comes closest to the meaning signature decoded from the individual’s brain.

For example, a participant watched a short video of a person jumping from the top of a waterfall. Using their brain activity, the AI model guessed strings of words, starting with ‘spring flow’, progressing to ‘above rapid falling water fall’ on the tenth guess and arriving at ‘a person jumps over a deep water fall on a mountain ridge’ on the 100th guess.

The researchers also asked participants to recall video clips that they had seen. The AI models successfully generated descriptions of these recollections, demonstrating that the brain seems to use a similar representation for both viewing and remembering.

Reading the future

This technique, which uses non-invasive functional magnetic resonance imaging, could help to improve the process by which implanted brain–computer interfaces might translate people’s non-verbal mental representations directly into text. “If we can do that using these artificial systems, maybe we can help out these people with communication difficulties,” says Huth, who developed a similar model in 2023 with his colleagues that decodes language from non-invasive brain recordings2.

These findings raise concerns about mental privacy, Huth says, as researchers grow closer to revealing intimate thoughts, emotions and health conditions that could, in theory, be used for surveillance, manipulation or to discriminate against people. Neither Huth’s model nor Horikawa’s cross a line, they both say, because these techniques require participants’ consent and the models cannot discern private thoughts. “Nobody has shown you can do that, yet,” says Huth.

doi: https://doi.org/10.1038/d41586-025-03624-1

References
  1. Horikawa, T. Sci. Adv. 11, eadw1464 (2025).

    Article Google Scholar 

  2. Tang, J., LeBel, A., Jain, S. & Huth, A. G. Nature Neurosci. 26, 858–866 (2023).

    Article PubMed Google Scholar

 

 

 
 

Interpret the world and change it

 
 
 

Privacy Policy

To unsubscribe, click here.