Two speechbrain speech enhancement models
Quick comparison of two models
The Colab notebook (with outputs) is here; the models are on the Huggingface hub: mtl-mimic-voicebank and speechbrain/metricgan-plus-voicebank
The first twenty seconds of mtl-mimic-voicebank
aren't great (but they are quieter in the recording); the rest is fantastic. The output from metricgan-plus-voicebank
is bad from start to finish.
%%capture
!pip install torchaudio speechbrain
!wget http://assets.doegen.ie/sound/MP3_versions/aud_Ul1-LA_1202d1u1.mp3
import IPython
IPython.display.Audio('aud_Ul1-LA_1202d1u1.mp3')
import torchaudio
from speechbrain.pretrained import SpectralMaskEnhancement
enhance_model = SpectralMaskEnhancement.from_hparams(
source="speechbrain/mtl-mimic-voicebank",
savedir="pretrained_models/mtl-mimic-voicebank",
)
enhanced = enhance_model.enhance_file("aud_Ul1-LA_1202d1u1.mp3")
# Saving enhanced signal on disk
torchaudio.save('enhanced.wav', enhanced.unsqueeze(0), 16000)
IPython.display.Audio('enhanced.wav')
import torch
enhance_model = SpectralMaskEnhancement.from_hparams(
source="speechbrain/metricgan-plus-voicebank",
savedir="pretrained_models/metricgan-plus-voicebank",
)
noisy = enhance_model.load_audio("aud_Ul1-LA_1202d1u1.mp3").unsqueeze(0)
# Add relative length tensor
enhanced = enhance_model.enhance_batch(noisy, lengths=torch.tensor([1.]))
# Saving enhanced signal on disk
torchaudio.save('enhanced2.wav', enhanced, 16000)
IPython.display.Audio('enhanced2.wav')