Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg

Name: Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg
Start: 2023-10-09T08:30:00+02:00
End: 2023-10-10T19:00:00+02:00
Location: Von-Melle-Park 4

9.–10. Okt. 2023

Von-Melle-Park 4

Europe/Berlin Zeitzone

Contact

info.hcds@uni-hamburg.de

Diffusion Models for Audio-Visual Speech Enhancement

119

Nicht eingeplant

20m

Von-Melle-Park 4

Poster

This poster showcases a selection of our work on diffusion models for speech enhancement. While diffusion models have proven successful in natural image generation, we adopt them for speech enhancement by introducing a task-adopted diffusion process in the complex short-time Fourier domain. Our results show competitive performance compared to strong predictive methods, while generalization is better when evaluated in a mismatched training scenario. However, for very challenging input, the model tends to produce speech-like sounds without semantic meaning. To address this problem, we condition the diffusion model on visual input with the speaker’s lips, resulting in improved speech quality and intelligibility. This improvement is reflected in a reduced word error rate of a downstream automatic speech recognition model.

Keywords

Diffusion models
Speech Enhancement
Audio-Visual
Generative Models

Julius Richter

Herr Timo Gerkmann

Es gibt derzeit keine Materialien.

Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg

Contact

Diffusion Models for Audio-Visual Speech Enhancement

Von-Melle-Park 4

Beschreibung

Keywords

Autor

Co-Autor

Präsentationsmaterialien

Wähle Zeitzone

Digital Total - Computing & Data Science an der Universität Hamburg und in der Wissenschaftsmetropole Hamburg

Contact

Beschreibung

Keywords

Autor

Co-Autor

Präsentationsmaterialien