Interesting links, 16/03/2022

Do massive multilingual models that recognize language-specific units (e.g. words, phonemes) work on *all* speech?

IMO, no! Multilingual =/= Universal.

Check out our #INTERSPEECH2021 paper, Differentiable Allophone Graphs for Language-Universal ASR! https://t.co/haRflRUGSh 1/N pic.twitter.com/YBx3lPtgNJ
— Brian Yan (@brianyan918) July 29, 2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition

NLP Seminar 220216 - Omar Sanseviero (Hugging Face)

11L – Speech recognition and Graph Transformer Networks

How can I get duration of all video files in a folder containing multiple subfolders?

exiftool -n -q -p '${Duration;our $sum;$_=ConvertDuration($sum+=$_)}' ./*.mp4| tail -n1

Breathing and Speech Planning in Spontaneous Speech Synthesis

mchaput/whoosh

Yann LeCun: “Energy-Based Self-Supervised Learning

Pseudo-Labeling for Massively Multilingual Speech Recognition

Implicit Language Model in LSTM for OCR

Exploring neural transducers for end-to-end speech recognition

Advancing Connectionist Temporal Classification with Attention Modeling

Advancing Acoustic-to-Word CTC Model

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Do End-to-End Speech Recognition Models Care About Context?

A study on effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition

microsoft/mutransformers

How to Train a Joint Embedding using Pytorch

adefossez/julius — Fast PyTorch based DSP for audio and 1D signals

Julius Orion Smith III Home Page

ageron/handson-ml2

asteroid-team/Libri_VAD

microsoft/DNS-Challenge — This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

The Norwegian Parliamentary Speech Corpus

Who Takes the Parliamentary Floor? The Role of Gender in Speech-making in the Swedish Riksdag