Transformer Memory as a Differentiable Search Index

It’s Raw! Audio Generation with State-Space Models, code

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval, code

Hierarchical Perceiver

mSLAM: Massively multilingual joint pre-training for speech and text

Who spoke when! How to Build your own Speaker Diarization Module, code

Differentiable Allophone Graphs for Language-Universal Speech Recognition

NLP Seminar 220216 - Omar Sanseviero (Hugging Face)

11L – Speech recognition and Graph Transformer Networks

How can I get duration of all video files in a folder containing multiple subfolders?

exiftool -n -q -p '${Duration;our $sum;$_=ConvertDuration($sum+=$_)}' ./*.mp4| tail -n1

Breathing and Speech Planning in Spontaneous Speech Synthesis

mchaput/whoosh

Yann LeCun: “Energy-Based Self-Supervised Learning

Pseudo-Labeling for Massively Multilingual Speech Recognition

Implicit Language Model in LSTM for OCR

Exploring neural transducers for end-to-end speech recognition

Advancing Connectionist Temporal Classification with Attention Modeling

Advancing Acoustic-to-Word CTC Model

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Do End-to-End Speech Recognition Models Care About Context?

A study on effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition

microsoft/mutransformers

How to Train a Joint Embedding using Pytorch

adefossez/julius — Fast PyTorch based DSP for audio and 1D signals

Julius Orion Smith III Home Page

ageron/handson-ml2

asteroid-team/Libri_VAD

microsoft/DNS-Challenge — This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

The Norwegian Parliamentary Speech Corpus

Who Takes the Parliamentary Floor? The Role of Gender in Speech-making in the Swedish Riksdag