A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

howard1337/S2VC

yistLin/universal-vocoder; paper: Towards achieving robust universal neural vocoding

cywang97/unispeech; paper: UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

microsoft/unilm — UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

Interactive demo: LayoutLMv2

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment; CZWin32768/XLM-Align

waydroid/waydroid

huseinzol05/malaya-speech

Fine-tuning XLSR-Wav2Vec2 for WOLOF ASR with 🤗

model = Wav2Vec2ForCTC.from_pretrained(
    "facebook/wav2vec2-large-xlsr-53", 
    attention_dropout=0.1,
    hidden_dropout=0.1,
    feat_proj_dropout=0.0,
    mask_time_prob=0.05,
    layerdrop=0.1,
    gradient_checkpointing=True,
    ctc_loss_reduction="mean",
    pad_token_id=processor.tokenizer.pad_token_id,
    vocab_size=len(processor.tokenizer)
)

training_args = TrainingArguments(
  output_dir="./wav2vec2-large-xlsr-WOLOF",
  group_by_length=True,
  per_device_train_batch_size=16,
  gradient_accumulation_steps=2,
  evaluation_strategy="steps",
  num_train_epochs=40,
  fp16=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=3e-4,
  warmup_steps=1000,
  save_total_limit=2,
)

run_spleeter.py

Few-shot Intent Classification and Slot Filling with Retrieved Examples

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces