Sámi

divvun/lang-sme-ml-speech

divvun/lang-sme-ml-swe

NbAiLab/whisper-large-sme

NbAiLab/salmon-whisper-large-smj-lr7e-5

NbAiLab/f5-tts-north-sami

divvun-tts/multi-sami

divvun-tts/6L-TTS

Does multilingual and multi-speaker modeling improve low-resource TTS? Experiments on Sámi languages

aalto-speech/northern-sami-asr

GetmanY1/wav2vec2-xls-r-300m-sami-parl-ext-ft

GetmanY1/wav2vec2-base-sami-22k

giellalt/speech-sme

Julev Sámi IPA

# ParlaSpeech

Spoken corpora of parliamentary debates ParlaSpeech 3.0

Welcome to ParlaSpeech

Misc

Correct MT output with word alignments

robertostling/eflomal

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

lucidrains/vector-quantize-pytorch

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

MambaInst: Lightweight State Space Model for Real-Time Instance Segmentation

nvidia/canary-1b-flash

spaces/nvidia/canary-1b-v2

NeMo/blob/main/docker/Dockerfile.speech

nvidia/nemo-megatron-t5-3B

sparkfish/augraphy — Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

bytedance/MegaTTS3

BiT S09E08

Speech-to-Retrieval, dataset, code

A History of Large Language Models

neuphonic/neucodec, code, space

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform, code

deepsearch-ai/deepsearch — A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images

Introduction to University Mathematics

Conversational image segmentation with Gemini 2.5

deepreinforce-ai/CRINN

Open Whisper-style Speech Models

raminnakhli/GMM-HMM-from-scratch

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation, code

Phi-4-multimodal-korean-finetuning

The best open source OCR models

Continuous Speech Tokenizer in Text To Speech, code

Orpheus 3B

DDT: Decoupled Diffusion Transformer

SesameAILabs/csm — A Conversational Speech Generation Model, model

langchain-ai/rag-from-scratch

Zettlr/Zettlr

Phoneme Segmentation Using Self-Supervised Speech Models, code

Never Mind the Buzzcocks

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations, code

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation

Theomat/sbsur — Stochastic Beam Search + Unique Randomizer

RF5/transfusion-asr — Transcribing Speech with Multinomial Diffusion, training code and models.

r/AudioBookBay

It has also been shown that adding pronunciation variants to the dictionary has a point of diminishing returns, as over-generated pronunciations can lead to ambiguity in the decoder and degrade its performance

Adaptation techniques to improve ASR performance on accented speakers

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

google-research-datasets/cvss — CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

Cymru-Breizh-Agile-Cymru-Project/vosk-cymraeg

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Flow Matching Guide and Code

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Distilling an End-to-End Voice Assistant Without Instruction Training Data

From the Forests — LibriVox volunteers bring you 18 recordings of From the Forests by Henry Kendall. This was the Fortnightly Poetry project for March 29, 2020.

xbpeng/MimicKit — Suite of motion imitation methods for training motion controllers.

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency, code, model, space

Vyvo/VyvoTTS-v0-Qwen3-0.6B

Special issue on finite-state methods in natural language processing and mathematics of language

nv-tlabs/vipe — ViPE: Video Pose Engine for Geometric 3D Perception

A First Course on Data Structures in Python

newton-physics/newton — An open-source, GPU-accelerated physics simulation engine built upon NVIDIA Warp, specifically targeting roboticists and simulation researchers.

Korpus Dawnych Polskich Tekstów Dramatycznych

H2IOSC

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention, code (empty)

Spoken corpora of parliamentary debates ParlaSpeech 3.0

vosen/ZLUDA — CUDA on non-NVIDIA GPUs

MCG-NJU/MotionRAG — [NeurIPS 2025] MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

SMM2026

SvarDOS/edrdos

SvarDOS - an open-source DOS distribution

Continual-Intelligence/SEAL — Self-Adapting Language Models

Diffusion Transformers with Representation Autoencoders, code

yukara-ikemiya/Open-Miipher-2 — PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind

DeCodec: Rethinking Audio Codecs as Universal Disentangled Representation Learners

MiMo Audio, code, report

Latent Diffusion Model without Variational Autoencoder

Ken Thompson Recalls Unix’s Rowdy, Lock-Picking Origins

Svenska språknämndens uttalsordbok