Interesting links, 19/01/2026

Misc. interesting things.

Jan 19, 2026 • 2 min read

Add pocketsphinx_to_textgrid utility for Praat TextGrid conversion

ReadAlong Studio

Sails Of Charon - Scorpions (Cover & TAB)

The Roadmap of Mathematics for Machine Learning

echodroff/praat_scripts

UM-Text: A Unified Multimodal Model for Image Understanding

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Super Monotonic Alignment Search, code

Michael Fitzgerald presentation at Thurles Library

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR, code

MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts, code

FEX-Emu/FEX — A fast usermode x86 and x86-64 emulator for Arm64 Linux

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

AndreRH/hangover — Hangover runs Win64 and Win32 applications on arm64 Linux

facebook/map-anything-benchmarking — Apache 2.0

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing, code

Acoustic Features and Auditory Impressions of Death Growl and Screaming Voice

kba/jsonld-rapper — Convert between RDF and JSON-LD using rapper

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

A Fast Bytecode VM for Arithmetic - The Compiler

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing, code

Italian blasphemy and German ingenuity - how swear words differ around the world

deepseek-ai/DeepSeek-OCR

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition, code

The Free Transformer

SpecTokenizer A Lightweight Streaming Codec in the Compressed Spectrum Domain

Data-Centric Lessons To Improve Speech-Language Pretraining

zserge/grayskull — A tiny, dependency-free computer vision library in C for embedded systems, drones, and robotics.

vivekkalyanarangan30/llm_from_scratch

LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF

Qwen/Qwen3-0.6B

WorldForge Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

MapAnything: Universal Feed-Forward Metric 3D Reconstruction, code, space

danny-avila/LibreChat — Enhanced ChatGPT Clone

blinkospace/blinko — An open-source, self-hosted personal AI note tool prioritizing privacy, built using TypeScript .

KiCad — A Cross Platform and Open Source PCB Design Suite

Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

DeCodec Rethinking Audio Codecs as Universal Disentangled Representation Learners

Europe’s Poverty Rates by Country in 2024

Covariant spatio-temporal receptive fields for spiking neural networks

Chatterbox Turbo — Ultra-Fast, Open-Source Text-to-Speech for Real-Time Voice AI

Xiaomi Released MiMo-Audio

XiaomiMiMo/MiMo-Audio, XiaomiMiMo/MiMo-Audio-Tokenizer, XiaomiMiMo/MiMo-Audio-7B-Base,

Qwen/Qwen3-Omni-30B-A3B-Captioner

The Illustrated GPT-OSS

baidu/ERNIE-4.5-VL-28B-A3B-Thinking

kitty — The fast, feature-rich, GPU based terminal emulator.

facebook/pe-av-large — PE-AV is a state-of-the-art multimodal model that embeds audio, video, audio-video, and text into a joint embedding space

Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

Improved Subword Modeling for WFST-Based Speech Recognition, code

Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sámi

New Baseline in Automatic Speech Recognition for Northern Sámi, code, models

Katarzyna Skrzynecka as Andrea Bocelli & Sarah Brightman - Twoja Twarz Brzmi Znajomo

8 Martial Artists Try to Disarm a Gun