TY - JOUR
T1 - Towards Naturalistic Speech Decoding from Intracranial Brain Data
AU - Berezutskaya, Julia
AU - Ambrogioni, Luca
AU - Ramsey, Nick F.
AU - van Gerven, Marcel A.J.
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN models for speech decoding in naturalistic noisy environments and further advancing of BCIs for naturalistic communication. Clinical Relevance - This study presents a novel speech decoding paradigm that combines advances in deep learning, speech synthesis and neural engineering, and has the potential to advance the field of BCI for severely paralyzed individuals.
AB - Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN models for speech decoding in naturalistic noisy environments and further advancing of BCIs for naturalistic communication. Clinical Relevance - This study presents a novel speech decoding paradigm that combines advances in deep learning, speech synthesis and neural engineering, and has the potential to advance the field of BCI for severely paralyzed individuals.
UR - http://www.scopus.com/inward/record.url?scp=85138128848&partnerID=8YFLogxK
U2 - 10.1109/EMBC48229.2022.9871301
DO - 10.1109/EMBC48229.2022.9871301
M3 - Article
C2 - 36085779
AN - SCOPUS:85138128848
VL - 2022
SP - 3100
EP - 3104
JO - Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
JF - Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
ER -