Follow
Naoyuki Kanda
Title
Cited by
Cited by
Year
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
S Chen, C Wang, Z Chen, Y Wu, S Liu, Z Chen, J Li, N Kanda, T Yoshioka, ...
IEEE Journal of Selected Topics in Signal Processing 16 (6), 1505-1518, 2022
14962022
A review of speaker diarization: Recent advances with deep learning
TJ Park, N Kanda, D Dimitriadis, KJ Han, S Watanabe, S Narayanan
Computer Speech & Language 72, 101317, 2022
3672022
CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings
S Watanabe, M Mandel, J Barker, E Vincent, A Arora, X Chang, ...
arXiv preprint arXiv:2004.09249, 2020
3282020
End-to-end neural speaker diarization with self-attention
Y Fujita, N Kanda, S Horiguchi, Y Xue, K Nagamatsu, S Watanabe
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019
2712019
End-to-end neural speaker diarization with permutation-free objectives
Y Fujita, N Kanda, S Horiguchi, K Nagamatsu, S Watanabe
arXiv preprint arXiv:1909.05952, 2019
2652019
Elastic spectral distortion for low resource speech recognition with deep neural networks
N Kanda, R Takeda, Y Obuchi
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on …, 2013
1482013
Serialized output training for end-to-end overlapped speech recognition
N Kanda, Y Gaur, X Wang, Z Meng, T Yoshioka
arXiv preprint arXiv:2003.12687, 2020
1212020
Internal language model estimation for domain-adaptive end-to-end speech recognition
Z Meng, S Parthasarathy, E Sun, Y Gaur, N Kanda, L Lu, X Chen, R Zhao, ...
2021 IEEE Spoken Language Technology Workshop (SLT), 243-250, 2021
1082021
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
D Raj, P Denisov, Z Chen, H Erdogan, Z Huang, M He, S Watanabe, J Du, ...
2021 IEEE spoken language technology workshop (SLT), 897-904, 2021
942021
Joint speaker counting, speech recognition, and speaker identification for overlapped speech of any number of speakers
N Kanda, Y Gaur, X Wang, Z Meng, Z Chen, T Zhou, T Yoshioka
arXiv preprint arXiv:2006.10930, 2020
812020
Microsoft speaker diarization system for the voxceleb speaker recognition challenge 2020
X Xiao, N Kanda, Z Chen, T Zhou, T Yoshioka, S Chen, Y Zhao, G Liu, ...
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
752021
Guided source separation meets a strong ASR backend: Hitachi/Paderborn University joint investigation for dinner party ASR
N Kanda, C Boeddeker, J Heitkaemper, Y Fujita, S Horiguchi, ...
arXiv preprint arXiv:1905.12230, 2019
722019
A two-layer model for behavior and dialogue planning in conversational service robots
M Nakano, Y Hasegawa, K Nakadai, T Nakamura, J Takeuchi, T Torii, ...
2005 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2005
692005
Maximum a posteriori Based Decoding for CTC Acoustic Models
N Kanda, X Lu, H Kawai
Interspeech 2016, 1868-1872, 2016
572016
Multi-domain spoken dialogue system with extensibility and robustness against speech recognition errors
K Komatani, N Kanda, M Nakano, K Nakadai, H Tsujino, T Ogata, ...
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, 9-17, 2006
562006
Streaming multi-talker ASR with token-level serialized output training
N Kanda, J Wu, Y Wu, X Xiao, Z Meng, X Wang, Y Gaur, Z Chen, J Li, ...
arXiv preprint arXiv:2202.00842, 2022
542022
The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
N Kanda, R Ikeshita, S Horiguchi, Y Fujita, K Nagamatsu, X Wang, ...
Proc. CHiME-5, 6-10, 2018
542018
Internal language model training for domain-adaptive end-to-end speech recognition
Z Meng, N Kanda, Y Gaur, S Parthasarathy, E Sun, L Lu, X Chen, J Li, ...
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
532021
Speechx: Neural codec language model as a versatile speech transformer
X Wang, M Thakker, Z Chen, N Kanda, SE Eskimez, S Chen, M Tang, ...
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
492024
Face-voice matching using cross-modal embeddings
S Horiguchi, N Kanda, K Nagamatsu
Proceedings of the 26th ACM international conference on Multimedia, 1011-1019, 2018
482018
The system can't perform the operation now. Try again later.
Articles 1–20