Fellow's Machine-learning Auditory Model Cited
A Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient has developed a machine-learning system that performs as well as humans in recognizing words and identifying musical genres in recordings of speech.
Further research by fellow Alex Kell also compared model results to functional magnetic resonance imaging (fMRI) brain scans. The relationship suggests the human auditory cortex, the brain area that deciphers sound, follows a hierarchical organization much like the visual cortex.
A Massachusetts Institute of Technology release highlighted Kell’s research, which was published in the April 19 issue of the journal Neuron. Kell is a fourth-year fellow studying computational neuroscience with Josh McDermott.
The release outlines the machine-learning code, called a neural network, Kell and McDermott used for the project. Information passes through the network’s interconnected layers of thousands of relatively simple units. The layers transform the input, culminating in a label – for example, a written word identifying a spoken one. If the system works, the label is correct.
Training neural networks consumes computer power and time. Kell funnels millions of pieces of labeled data to each model so it learns patterns it can use to classify unknown input.
Kell and his colleagues trained the network to perform two auditory tasks. First was identifying a word in the middle of two-second recordings of a person talking. Second was to identify the genre of two-second music excerpts.
The researchers fed the model thousands of music and speech examples to train it. To make the job even more realistic and difficult, the music and speech clips were embedded in background noise.
After training, the model performed the tasks just as accurately as human listeners and even tended to make mistakes on the same clips on which humans made the most mistakes.
The researchers then compared how their model and the human brain process auditory information. Using fMRI, the team fed natural sounds to volunteers and recorded how their auditory cortices area responded. When the team presented the same sounds to the neural network, its layers responded similarly. The model’s middle stages corresponded best to activity in the primary auditory cortex and later stages corresponded best to activity outside of the primary cortex, MIT reported. This provides evidence that the auditory cortex might be arranged in a hierarchical fashion, similar to the visual cortex, the researchers say.
Kell, a Boston-area native, expects to graduate in 2019 and then pursue postdoctoral research focused on recording individual neuron activity in animals.