The Colab notebook (with outputs) is here; the models are on the Huggingface hub: mtl-mimic-voicebank and speechbrain/metricgan-plus-voicebank

The first twenty seconds of mtl-mimic-voicebank aren't great (but they are quieter in the recording); the rest is fantastic. The output from metricgan-plus-voicebank is bad from start to finish.

%%capture
!pip install torchaudio speechbrain

!wget http://assets.doegen.ie/sound/MP3_versions/aud_Ul1-LA_1202d1u1.mp3

import IPython
IPython.display.Audio('aud_Ul1-LA_1202d1u1.mp3')

import torchaudio
from speechbrain.pretrained import SpectralMaskEnhancement

enhance_model = SpectralMaskEnhancement.from_hparams(
    source="speechbrain/mtl-mimic-voicebank",
    savedir="pretrained_models/mtl-mimic-voicebank",
)
enhanced = enhance_model.enhance_file("aud_Ul1-LA_1202d1u1.mp3")

# Saving enhanced signal on disk
torchaudio.save('enhanced.wav', enhanced.unsqueeze(0), 16000)

IPython.display.Audio('enhanced.wav')

import torch
enhance_model = SpectralMaskEnhancement.from_hparams(
    source="speechbrain/metricgan-plus-voicebank",
    savedir="pretrained_models/metricgan-plus-voicebank",
)

noisy = enhance_model.load_audio("aud_Ul1-LA_1202d1u1.mp3").unsqueeze(0)

# Add relative length tensor
enhanced = enhance_model.enhance_batch(noisy, lengths=torch.tensor([1.]))

# Saving enhanced signal on disk
torchaudio.save('enhanced2.wav', enhanced, 16000)

IPython.display.Audio('enhanced2.wav')