Compute and memory efficient universal sound source separation E Tzinis, Z Wang, X Jiang, P Smaragdis Journal of Signal Processing Systems 94 (2), 245-259, 2022 | 50 | 2022 |
Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation X Jiang, C Han, N Mesgarani arXiv preprint arXiv:2403.18257, 2024 | 37 | 2024 |
Ssamba: Self-supervised audio representation learning with mamba state space model S Shams, SS Dindar, X Jiang, N Mesgarani arXiv preprint arXiv:2405.11831, 2024 | 19 | 2024 |
Learning representations for new sound classes with continual self-supervised learning Z Wang, C Subakan, X Jiang, J Wu, E Tzinis, M Ravanelli, P Smaragdis IEEE Signal Processing Letters 29, 2607-2611, 2022 | 18 | 2022 |
Phoneme-level bert for enhanced prosody of text-to-speech with grapheme predictions YA Li, C Han, X Jiang, N Mesgarani ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 16 | 2023 |
Speech slytherin: Examining the performance and efficiency of mamba for speech separation, recognition, and synthesis X Jiang, YA Li, AN Florea, C Han, N Mesgarani arXiv preprint arXiv:2407.09732, 2024 | 7 | 2024 |
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform YA Li, C Han, X Jiang, N Mesgarani arXiv preprint arXiv:2309.09493, 2023 | 6 | 2023 |
Style-talker: Finetuning audio language model and style-based text-to-speech model for fast spoken dialogue generation YA Li, X Jiang, J Darefsky, G Zhu, N Mesgarani arXiv preprint arXiv:2408.11849, 2024 | 3 | 2024 |
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion YA Li, X Jiang, C Han, N Mesgarani arXiv preprint arXiv:2409.10058, 2024 | 2 | 2024 |
Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience X Jiang, C Han, YA Li, N Mesgarani arXiv preprint arXiv:2402.03710, 2024 | 2 | 2024 |
Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation X Jiang, C Han, YA Li, N Mesgarani ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 1 | 2024 |
Exploring Finetuned Audio-LLM on Heart Murmur Features A Florea, X Jiang, N Mesgarani, X Jiang arXiv preprint arXiv:2501.13884, 2025 | | 2025 |
Just ASR+ LLM? A Study on Speech Large Language Models’ Ability to Identify And Understand Speaker in Spoken Dialogue J Wu, X Fan, BR Lu, X Jiang, N Mesgarani, M Hasegawa-Johnson, ... 2024 IEEE Spoken Language Technology Workshop (SLT), 1137-1143, 2024 | | 2024 |
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes X Jiang, YA Li, N Mesgarani arXiv preprint arXiv:2305.18441, 2023 | | 2023 |
Vector-quantized speech separation X Jiang | | 2021 |