UniPASE: A Generative Model for Universal Speech Enhancement
with High Fidelity and Low Hallucinations

Xiaobin Rong1,2, Zheng Wang1,2, Yushi Wang1,2, Jun Gao1,2, Jing Lu1,2
1Key Laboratory of Modern Acoustics, Nanjing University
2NJU-Horizon Intelligent Audio Lab, Horizon Robotics
Contents
  1. Audio Demos from the DNS 2020 No-reverb Test set
  2. Audio Demos from the DNS 2020 With-reverb Test set
  3. Audio Demos from the PLC 2024 Validation set
  4. Audio Demos from the VoiceFixer GSR Test set
  5. Audio Demos from the URGENT 2025 Non-blind Test set
  6. Audio Examples in Ablation Study
  7. References
Audio Demos from the DNS 2020 No-reverb Set
Noisy Signal
TF-GridNet [1] Output
StoRM [2] Output
LLaSE-G1 [3] Output
AnyEnhance [4] Output
PASE [5] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
StoRM [2] Output
LLaSE-G1 [3] Output
AnyEnhance [4] Output
PASE [5] Output
UniPASE Output (Ours)
Clean
Audio Demos from the DNS 2020 With-reverb Set
Noisy Signal
TF-GridNet [1] Output
StoRM [2] Output
LLaSE-G1 [3] Output
AnyEnhance [4] Output
PASE [5] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
StoRM [2] Output
LLaSE-G1 [3] Output
AnyEnhance [4] Output
PASE [5] Output
UniPASE Output (Ours)
Clean
Audio Demos from the PLC 2024 Validation Set
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
LLaSE-G1 Output
UniPASE Output (Ours)
Clean
Audio Demos from the VoiceFixer GSR Test Set
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet [1] Output
VoiceFixer [6] Output
AnyEnhance [4] Output
UniPASE Output (Ours)
Clean
Audio Demos from the URGENT 2025 Non-blind Test Set
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Noisy Signal
TF-GridNet Output
UniPASE Output (Ours)
Clean
Audio Examples in Ablation Study
Example 1
Enhanced Signal without MSRD
Enhanced Signal with MSRD
Example 2
Enhanced Signal without MSRD
Enhanced Signal with MSRD
Example 3
Enhanced Signal without MSRD
Enhanced Signal with MSRD
References

[1] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, “TF-GridNet: Integrating full-and sub-band modeling for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3221-3236, 2023.

[2] J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2724-2737, 2023.

[3] B. Kang, X. Zhu, Z. Zhang, Z. Ye, M. Liu, Z. Wang, Y. Zhu, G. Ma, J. Chen, L. Xiao, C. Weng, W. Xue, and L. Xie, “LLaSE-G1: Incentivizing generalization capability for LLaMA-based speech enhancement,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, Jul. 2025, pp. 13 292-13 305.

[4] J. Zhang, J. Yang, Z. Fang, Y. Wang, Z. Zhang, Z. Wang, F. Fan, and Z. Wu, “Anyenhance: A unified generative model with prompt-guidance and self-critic for voice enhancement,” IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 3085-3098, 2025.

[5] X. Rong, Q. Hu, M. Yesilbursa, K. Wojcicki, and J. Lu, “PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement,” in Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026), 2026, accepted.

[6] H. Liu, X. Liu, Q. Kong, Q. Tian, Y. Zhao, D. Wang, C. Huang, and Y. Wang, “Voicefixer: A unified framework for high-fidelity speech restoration,” in Interspeech 2022, 2022, pp. 4232-4236.