Interesting links, 19/01/2026

Integrating Lattice-Free MMI Into End-to-End Speech Recognition

Google discovers emergent temporal abstractions in autoregressive models

These models learn linearly controllable action representations in their residual streams—activating them executes long-horizon behaviors. This enables Internal RL to solve sparse-reward hierarchical tasks… pic.twitter.com/GxOObljGcB
— DailyPapers (@HuggingPapers) December 26, 2025

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

icicle-emu/icicle-emu — Icicle is an experimental fuzzing-specific, multi-architecture emulation framework.

language-based-audio-retrieval

FLAM: Frame-Wise Language-Audio Modeling

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining, data

Italian-Ligurian Machine Translation in Its Cultural Context, data

xieh97/language-based-audio-retrieval

Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition

BEA-Base: A Benchmark for ASR of Spontaneous Hungarian, arXiv

Philippine Languages Database: A Multilingual Speech Corpora for Developing Systems for Low-Resource Languages

Bi-dialectal ASR of Armenian from Naturalistic and Read Speech

Qwen3-TTS Technical Report

Byte Latent Transformer: Patches Scale Better Than Tokens

Hilbert - Foundations of Geometry

librivox
gutenberg (LaTeX)

Calculus Made Easy

Zseni Leszek

I was using italki wrong

How to REALLY Learn a Language in 2026

Duome HU

DepEdit

clarinsi/conllu-diff

pocketsphinx, clarinstudio, files

strob/gentle, page

tdnn

EPICS OF THE HUNGARIAN PLAIN

Deep Learning with PyTorch...

1) Cheat sheet [PDF]: https://t.co/TcRqfgqFOK

2) Learn fundamentals with hands-on coding [PDF]: https://t.co/IsXFjwVAhk

3) #GenerativeAI with Python and PyTorch: https://t.co/hfbERRk99u book v/ @PacktDataML pic.twitter.com/o8XVe0A7O1
— Kirk Borne (@KirkDBorne) January 26, 2026

deepseek-ai/Engram

kerrickstaley/genanki

folkscanomy_defense

MiMo-Audio: Audio Language Models are Few-Shot Learners, code, Tokenizer, 7B-Base, 7B-Instruct, demo

Kelly

Evaluation of speech and speech synthesis

Submission deadline: 30 June 2026
Submission portal: https://www.editorialmanager.com/ycsla/default.aspx
Guide for Authors

prefix-dev/pixi

sardin

Keyword Mamba: Spoken keyword spotting with state space models

Is self-supervised learning enough to fill in the gap? A study on speech inpainting

Enhanced audio-visual speech enhancement with posterior sampling methods in recurrent variational autoencoders

Mispronunciation detection and diagnosis based on large language models

SemanticAudio: Audio Generation and Editing in Semantic Space

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations

You can now run 70B LLMs on a 4GB GPU.

AirLLM just killed the "you need expensive hardware" excuse.

It runs 70B models on 4GB VRAM.
It loads models one layer at a time, runs 405B Llama 3.1 on 8GB VRAM.

→ No quantization needed by default
→ Run Llama, Qwen, Mistral, Mixtral… pic.twitter.com/L697FHoeCi
— Hasan Toor (@hasantoxr) January 31, 2026

The Principles of Diffusion Models

Omnilingual ASR, code, dataset, arXiv

yodas2

Poor WER when trying to fine-tune Parakeet v2 TDT to other dataset than English, bug

finetuning-parakeet-on-hindi-dataset

WEB-derived pronunciations

HiMo: High-Speed Objects Motion Compensation in Point Clouds, code, dataset

Survey of end-to-end multi-speaker automatic speech recognition for monaural audio

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Disentangling Prosody Representations With Unsupervised Speech Reconstruction

Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks

Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment, arXiv