StuPASE Demo

>

Audio Examples of the Effect of Early Reflections

Dry Signal: Original clean speech without artifically added reflections.
Early-Reflected Signal: Clean speech convolved with the first 50 ms of a room impulse response (RIR), simulating early reflections.
Reverberant Signal: Clean speech convolved with the full RIR, including both early and late reverberation.
The RIRs are sourced from the open datasets openSLR26 and openSLR28 (https://openslr.org/resources.php).

fileid_0

Reverberant Signal

Early-Reflected Signal

Dry Signal

fileid_1

Reverberant Signal

Early-Reflected Signal

Dry Signal

Audio Examples for Comparison

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

Noisy Signal

TF-GridNet [1] Output

FlowSE [2] Output

PASE [3] Output

PASE-R Output

AES-V2 [4] Output

SenSE [5] Output

StuPASE Output (Ours)

Clean

References

[1] Z.-Q. Wang, S. Cornell, S. Choi, Y. Lee, B.-Y. Kim, and S. Watanabe, “TF-GridNet: Integrating full-and sub-band modeling for speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3221-3236, 2023.

[2] Z. Wang, Z. Liu, X. Zhu, Y. Zhu, M. Liu, J. Chen, L. Xiao, C. Weng, and L. Xie, “FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching,” in Interspeech 2025, 2025, pp. 4858-4862.

[3] X. Rong, Q. Hu, M. Yesilbursa, K. Wojcicki, and J. Lu, “PASE: Leveraging the Phonological Prior of WavLM for Low- Hallucination Generative Speech Enhancement,” in Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026), 2026, accepted.

[4] https://podcast.adobe.com/enhance

[5] X. Li, H. Xie, Z. Wang, Z. Zhang, L. Xiao, and L. Xie, “SenSE: Semantic-aware high-fidelity universal speech enhancement,” 2025. [Online]. Available: https://arxiv.org/abs/2509.24708