Interesting links, 3/10/2025
Misc. interesting things.
Sámi
NbAiLab/salmon-whisper-large-smj-lr7e-5
Does multilingual and multi-speaker modeling improve low-resource TTS? Experiments on Sámi languages
aalto-speech/northern-sami-asr
GetmanY1/wav2vec2-xls-r-300m-sami-parl-ext-ft
GetmanY1/wav2vec2-base-sami-22k
# ParlaSpeech
Spoken corpora of parliamentary debates ParlaSpeech 3.0
Misc
Correct MT output with word alignments
lucidrains/vector-quantize-pytorch
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
MambaInst: Lightweight State Space Model for Real-Time Instance Segmentation
NeMo/blob/main/docker/Dockerfile.speech
sparkfish/augraphy — Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Speech-to-Retrieval, dataset, code
A History of Large Language Models
neuphonic/neucodec, code, space
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform, code
deepsearch-ai/deepsearch — A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images
Introduction to University Mathematics
Conversational image segmentation with Gemini 2.5
Open Whisper-style Speech Models
raminnakhli/GMM-HMM-from-scratch
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation, code
Phi-4-multimodal-korean-finetuning
The best open source OCR models
Continuous Speech Tokenizer in Text To Speech, code
DDT: Decoupled Diffusion Transformer
SesameAILabs/csm — A Conversational Speech Generation Model, model
Phoneme Segmentation Using Self-Supervised Speech Models, code
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations, code
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation
Theomat/sbsur — Stochastic Beam Search + Unique Randomizer
RF5/transfusion-asr — Transcribing Speech with Multinomial Diffusion, training code and models.
It has also been shown that adding pronunciation variants to the dictionary has a point of diminishing returns, as over-generated pronunciations can lead to ambiguity in the decoder and degrade its performance
Adaptation techniques to improve ASR performance on accented speakers
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
google-research-datasets/cvss — CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
Cymru-Breizh-Agile-Cymru-Project/vosk-cymraeg
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Distilling an End-to-End Voice Assistant Without Instruction Training Data
From the Forests — LibriVox volunteers bring you 18 recordings of From the Forests by Henry Kendall. This was the Fortnightly Poetry project for March 29, 2020.
xbpeng/MimicKit — Suite of motion imitation methods for training motion controllers.
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency, code, model, space
Special issue on finite-state methods in natural language processing and mathematics of language
nv-tlabs/vipe — ViPE: Video Pose Engine for Geometric 3D Perception
A First Course on Data Structures in Python
newton-physics/newton — An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.
Korpus Dawnych Polskich Tekstów Dramatycznych
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention, code (empty)
Spoken corpora of parliamentary debates ParlaSpeech 3.0
vosen/ZLUDA — CUDA on non-NVIDIA GPUs
MCG-NJU/MotionRAG — [NeurIPS 2025] MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
SvarDOS - an open-source DOS distribution
Continual-Intelligence/SEAL — Self-Adapting Language Models
Diffusion Transformers with Representation Autoencoders, code
yukara-ikemiya/Open-Miipher-2 — PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners
Latent Diffusion Model without Variational Autoencoder