Integrating Lattice-Free MMI Into End-to-End Speech Recognition

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

icicle-emu/icicle-emu — Icicle is an experimental fuzzing-specific, multi-architecture emulation framework.

language-based-audio-retrieval

FLAM: Frame-Wise Language-Audio Modeling

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining, data

Italian-Ligurian Machine Translation in Its Cultural Context, data

xieh97/language-based-audio-retrieval

Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition

BEA-Base: A Benchmark for ASR of Spontaneous Hungarian, arXiv

Philippine Languages Database: A Multilingual Speech Corpora for Developing Systems for Low-Resource Languages

Bi-dialectal ASR of Armenian from Naturalistic and Read Speech

Qwen3-TTS Technical Report

Byte Latent Transformer: Patches Scale Better Than Tokens

Hilbert - Foundations of Geometry

Calculus Made Easy

Zseni Leszek

I was using italki wrong

How to REALLY Learn a Language in 2026

Duome HU

DepEdit

clarinsi/conllu-diff

pocketsphinx, clarinstudio, files

strob/gentle, page

tdnn

EPICS OF THE HUNGARIAN PLAIN

deepseek-ai/Engram

kerrickstaley/genanki

folkscanomy_defense

MiMo-Audio: Audio Language Models are Few-Shot Learners, code, Tokenizer, 7B-Base, 7B-Instruct, demo

Kelly

Evaluation of speech and speech synthesis

  • Submission deadline: 30 June 2026
  • Submission portal: https://www.editorialmanager.com/ycsla/default.aspx
  • Guide for Authors

prefix-dev/pixi

sardin

Keyword Mamba: Spoken keyword spotting with state space models

Is self-supervised learning enough to fill in the gap? A study on speech inpainting

Enhanced audio-visual speech enhancement with posterior sampling methods in recurrent variational autoencoders

Mispronunciation detection and diagnosis based on large language models

SemanticAudio: Audio Generation and Editing in Semantic Space

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations

The Principles of Diffusion Models

Omnilingual ASR, code, dataset, arXiv

yodas2

Poor WER when trying to fine-tune Parakeet v2 TDT to other dataset than English, bug

finetuning-parakeet-on-hindi-dataset

WEB-derived pronunciations

HiMo: High-Speed Objects Motion Compensation in Point Clouds, code, dataset

Survey of end-to-end multi-speaker automatic speech recognition for monaural audio

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks

Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment, arXiv

Integrating Lattice-Free MMI Into End-to-End Speech Recognition


csb-adecl

Kashubian Frequency list