Dry Signal: Original clean speech without artifically added reflections.
Early-Reflected Signal: Clean speech convolved with the first 50 ms of a room impulse response (RIR), simulating early reflections.
Reverberant Signal: Clean speech convolved with the full RIR, including both early and late reverberation.
The RIRs are sourced from the open datasets openSLR26 and openSLR28 (https://openslr.org/resources.php).
[1] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, “TF-GridNet: Integrating full-and sub-band modeling for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3221-3236, 2023.
[2] Z. Wang, Z. Liu, X. Zhu, Y. Zhu, M. Liu, J. Chen, L. Xiao, C. Weng, and L. Xie, “FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching,” in Interspeech 2025, 2025, pp. 4858-4862.
[3] X. Rong, Q. Hu, M. Yesilbursa, K. Wojcicki, and J. Lu, “PASE: Leveraging the Phonological Prior of WavLM for Low- Hallucination Generative Speech Enhancement,” in Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026), 2026, accepted.
[4] https://podcast.adobe.com/enhance
[5] X. Li, H. Xie, Z. Wang, Z. Zhang, L. Xiao, and L. Xie, “SenSE: Semantic-aware high-fidelity universal speech enhancement,” 2025. [Online]. Available: https://arxiv.org/abs/2509.24708