It has also been shown that adding pronunciation variants to the dictionary has a point of diminishing returns, as over-generated pronunciations can lead to ambiguity in the decoder and degrade its performance

Adaptation techniques to improve ASR performance on accented speakers

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

google-research-datasets/cvss — CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

Cymru-Breizh-Agile-Cymru-Project/vosk-cymraeg

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Flow Matching Guide and Code

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Distilling an End-to-End Voice Assistant Without Instruction Training Data

From the Forests — LibriVox volunteers bring you 18 recordings of From the Forests by Henry Kendall. This was the Fortnightly Poetry project for March 29, 2020.

xbpeng/MimicKit — Suite of motion imitation methods for training motion controllers.

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency, code, model, space

Vyvo/VyvoTTS-v0-Qwen3-0.6B

Special issue on finite-state methods in natural language processing and mathematics of language

nv-tlabs/vipe — ViPE: Video Pose Engine for Geometric 3D Perception

A First Course on Data Structures in Python

newton-physics/newton — An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.

Korpus Dawnych Polskich Tekstów Dramatycznych

H2IOSC

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention, code (empty)

Spoken corpora of parliamentary debates ParlaSpeech 3.0

vosen/ZLUDA — CUDA on non-NVIDIA GPUs

MCG-NJU/MotionRAG — [NeurIPS 2025] MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

SMM2026

SvarDOS/edrdos

SvarDOS - an open-source DOS distribution

Continual-Intelligence/SEAL — Self-Adapting Language Models

Diffusion Transformers with Representation Autoencoders, code

yukara-ikemiya/Open-Miipher-2 — PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind