[1] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, “TF-GridNet: Integrating full-and sub-band modeling for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3221-3236, 2023.
[2] J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724-2737, 2023.
[3] B. Kang, X. Zhu, Z. Zhang, Z. Ye, M. Liu, Z. Wang, Y. Zhu, G. Ma, J. Chen, L. Xiao, C. Weng, W. Xue, and L. Xie, “LLaSE-G1: Incentivizing generalization capability for LLaMA-based speech enhancement,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, Jul. 2025, pp. 13 292-13 305.
[4] J. Zhang, J. Yang, Z. Fang, Y. Wang, Z. Zhang, Z. Wang, F. Fan, and Z. Wu, “Anyenhance: A unified generative model with prompt-guidance and self-critic for voice enhancement,” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 3085-3098, 2025.
[5] X. Rong, Q. Hu, M. Yesilbursa, K. Wojcicki, and J. Lu, “PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement,” in Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026), 2026, accepted.
[6] H. Liu, X. Liu, Q. Kong, Q. Tian, Y. Zhao, D. Wang, C. Huang, and Y. Wang, “Voicefixer: A unified framework for high-fidelity speech restoration,” in Interspeech 2022, 2022, pp. 4232-4236.