Interesting links, 26/9/2021
Misc. interesting things.
A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
yistLin/universal-vocoder; paper: Towards achieving robust universal neural vocoding
cywang97/unispeech; paper: UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
microsoft/unilm — UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment; CZWin32768/XLM-Align
Fine-tuning XLSR-Wav2Vec2 for WOLOF ASR with 🤗
model = Wav2Vec2ForCTC.from_pretrained(
"facebook/wav2vec2-large-xlsr-53",
attention_dropout=0.1,
hidden_dropout=0.1,
feat_proj_dropout=0.0,
mask_time_prob=0.05,
layerdrop=0.1,
gradient_checkpointing=True,
ctc_loss_reduction="mean",
pad_token_id=processor.tokenizer.pad_token_id,
vocab_size=len(processor.tokenizer)
)
training_args = TrainingArguments(
output_dir="./wav2vec2-large-xlsr-WOLOF",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=500,
eval_steps=500,
logging_steps=500,
learning_rate=3e-4,
warmup_steps=1000,
save_total_limit=2,
)
Few-shot Intent Classification and Slot Filling with Retrieved Examples
Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces