Personalized Dereverberation of Speech

Classic non-blind speech dereverberation methods produce high-quality results only when the precise impulse response is known. Alternatively, learning-based blind methods cannot ensure adequate dereverberation in all environments. We propose an environment- and speaker-specific approach combining the advantages of both approaches. With a simple, one-time personalization step, our model generalizes a single measured impulse response to its spatial neighborhood. Specifically, the two-stage model performs feature-based Wiener deconvolution followed by a network-based refinement. Extensive experimental results indicate that our approach quantitatively and qualitatively outperforms the state-of-the-art methods. Additional user studies confirm that our method is overwhelmingly favored by listeners.