Towards End-to-end Unsupervised Speech Recognition

@misc{https://doi.org/10.48550/arxiv.2204.02492,
  doi = {10.48550/ARXIV.2204.02492},
  url = {https://arxiv.org/abs/2204.02492},
  author = {Liu, Alexander H. and Hsu, Wei-Ning and Auli, Michael and Baevski, Alexei},
  title = {Towards End-to-end Unsupervised Speech Recognition},
  year = {2022},
}

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection

@misc{https://doi.org/10.48550/arxiv.1808.02228,
  doi = {10.48550/ARXIV.1808.02228},
  url = {https://arxiv.org/abs/1808.02228},
  author = {Wang, Yu-Hsuan and Lee, Hung-yi and Lee, Lin-shan},
  title = {Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection},
  year = {2018},
}

zhenghuatan/rVADfast

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing, microsoft/SpeechT5

@misc{https://doi.org/10.48550/arxiv.2110.07205,
  doi = {10.48550/ARXIV.2110.07205},
  url = {https://arxiv.org/abs/2110.07205},
  author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
  title = {SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
  year = {2021},
}

How to load the pretrained models in pytorch

Multilingual and Multimodal Learning for Brazilian Portuguese

RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions

Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media

Diachronic Parsing of Pre-Standard Irish

probabilisticai/probai-2022, videos


Using AI to decode speech from brain activity


add wav2vec2_alignment

Add fairseq FastSpeech2

Add Emformer

data2vec-vision Onnx ready-made configuration

Add a TF in-graph tokenizer for BERT

add MobileNetV2 model

Adding Omnivore Model to HF

Layoutlmv2 tesseractconfig

pyannote/embedding

ASR chunking


LITHME

CLARIN Annual Conference 2022


google/lyra — A Very Low-Bitrate Codec for Speech Compression

salesforce/awd-lstm-lm

MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models

Transflower: probabilistic autoregressive dance generation with multimodal attention, code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

An investigation of phone-based subword units for end-to-end speech recognition

Sequence-to-sequence learning with Transducers

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

ONLINE ASR WITH EMFORMER RNN-T

code

Recordings Database

spaces/k2-fsa/automatic-speech-recognition

csukuangfj/optimized_transducer

Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping

Integrating Lattice-Free MMI into End-to-End Speech Recognition

clarin-eric/parla-clarin

clarin-eric/ParlaMint

MASC-MEG

But what is the Fourier Transform? A visual introduction.

AudioLM: a Language Modeling Approach to Audio Generation


Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Layer-wise analysis of a self-supervised speech representation


L2-ARCTIC