Interesting links, 19/01/2026
Misc. interesting things.
Add pocketsphinx_to_textgrid utility for Praat TextGrid conversion
Sails Of Charon - Scorpions (Cover & TAB)
The Roadmap of Mathematics for Machine Learning
UM-Text: A Unified Multimodal Model for Image Understanding
VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec
Super Monotonic Alignment Search, code
Michael Fitzgerald presentation at Thurles Library
Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR, code
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts, code
FEX-Emu/FEX — A fast usermode x86 and x86-64 emulator for Arm64 Linux
AndreRH/hangover — Hangover runs Win64 and Win32 applications on arm64 Linux
facebook/map-anything-benchmarking — Apache 2.0
SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing, code
Acoustic Features and Auditory Impressions of Death Growl and Screaming Voice
kba/jsonld-rapper — Convert between RDF and JSON-LD using rapper
Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training
A Fast Bytecode VM for Arithmetic - The Compiler
Writing Speed-of-Light Flash Attention for 5090 in CUDA C++
CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing, code
Italian blasphemy and German ingenuity - how swear words differ around the world
Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition, code
SpecTokenizer A Lightweight Streaming Codec in the Compressed Spectrum Domain
Data-Centric Lessons To Improve Speech-Language Pretraining
zserge/grayskull — A tiny, dependency-free computer vision library in C for embedded systems, drones, and robotics.
vivekkalyanarangan30/llm_from_scratch
LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF
WorldForge Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
MapAnything: Universal Feed-Forward Metric 3D Reconstruction, code, space
danny-avila/LibreChat — Enhanced ChatGPT Clone
blinkospace/blinko — An open-source, self-hosted personal AI note tool prioritizing privacy, built using TypeScript .
KiCad — A Cross Platform and Open Source PCB Design Suite
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
DeCodec Rethinking Audio Codecs as Universal Disentangled Representation Learners
Europe’s Poverty Rates by Country in 2024
Covariant spatio-temporal receptive fields for spiking neural networks
Chatterbox Turbo — Ultra-Fast, Open-Source Text-to-Speech for Real-Time Voice AI
XiaomiMiMo/MiMo-Audio, XiaomiMiMo/MiMo-Audio-Tokenizer, XiaomiMiMo/MiMo-Audio-7B-Base,
Qwen/Qwen3-Omni-30B-A3B-Captioner
baidu/ERNIE-4.5-VL-28B-A3B-Thinking
kitty — The fast, feature-rich, GPU based terminal emulator.
facebook/pe-av-large — PE-AV is a state-of-the-art multimodal model that embeds audio, video, audio-video, and text into a joint embedding space
Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
Improved Subword Modeling for WFST-Based Speech Recognition, code
New Baseline in Automatic Speech Recognition for Northern Sámi, code, models
Katarzyna Skrzynecka as Andrea Bocelli & Sarah Brightman - Twoja Twarz Brzmi Znajomo