Interesting links, 12/08/2025
Misc. interesting things.
langextract — A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
AI system discovers visual categories while adapting to new contexts: Open Ad-hoc Categorization with Contextualized Feature Learning — no code.
A Deeper Dive into Apache Iceberg V3
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search, code
serengil/deepface — A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
Reinforcement Learning: An Overview
liquidslr/leetcode-company-wise-problems — Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode company tags. Updated as of 20 June, 2025
liquidslr/system-design-notes — Notes of the book System Design Interview - An Insider’s Guide
Fine-Tuning SAM 2 on a Custom Dataset, notebook
An acoustic analysis of sentence level prominence in Pakistani English speech
Optimizing Whisper models for Amazigh ASR: a comparative analysis
Def2Vec: you shall know a word by its definition
Correction: Automatic hate speech detection in audio using machine learning algorithms
Hybrid RMDL-CNN for speech recognition from unclear speech signal
Continuous Speech Tokenizer in Text To Speech
International Journal of Speech Technology - articles
Journal of Speech, Language, and Hearing Research
Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) ** | 300 hours Switchboard | Libri-Light + CommonVoice + Switchboard + Fisher | download
black-forest-labs/flux — The schnell model is open.
jermp/tongrams — A C++ library providing fast language model queries in compressed space.
Jak zapisywać liczby w tekstach – podsumowanie
WavChat: A Survey of Spoken Dialogue Models
Razer Blade 15” (2020) Charging Port Replacement
Public defence of doctoral thesis
ByteDance-Seed/Seed-OSS-36B-Instruct
Enhancing In-the-Wild Speech Emotion Conversion with Resynthesis-based Duration Modeling
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
Pretrained Conformers for Audio Fingerprinting and Retrieval, notebook
RF5/simple-speaker-embedding — A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.
Qwen2-Audio: Chat with Your Voice
flexthink/librig2p-nostress-space-cmu
Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework, Kaldi script directory
Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment
@inproceedings{rousso24_interspeech,
title = {Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment},
author = {Rotem Rousso and Eyal Cohen and Joseph Keshet and Eleanor Chodroff},
year = {2024},
booktitle = {Interspeech 2024},
pages = {1525--1529},
doi = {10.21437/Interspeech.2024-429},
issn = {2958-1796},
}
Orange-OpenSource/conllueditor
Joyce — Amstrad PCW emulator
Docker Containers on the Desktop
fadams/docker-gui — The code repository for a book providing a detailed step-by-step guide to packaging and running GUI applications as Docker containers.
Accelerating Neural Network Training with Semi-Structured Sparsity
microsoft/Phi-4-multimodal-instruct, sample_finetune_speech.py
seastar105/Phi-4-mm-inst-zeroth-kor — Korean fine tune script
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training, demo
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention, code, demo, space
The Windows Subsystem for Linux Is Now Open Source
microsoft/WSL, microsoft/WSL2-Linux-Kernel
Microsoft Open Sources OpenHCL, a Linux-Based ‘Paravisor’
ZeroSep: Separate Anything in Audio with Zero Training, code
rllm-org/rllm — Democratizing Reinforcement Learning for LLMs
agentica-org/DeepCoder-14B-Preview
ByteDance-Seed/Seed-OSS-36B-Instruct
Google’s language resources repo has a dockerfile for merline
here,
this is available as langtech/base-merlin:v1_1
Terrible things happen in life – but it is possible to recover from them
Lifting Motion to the 3D World via 2D Diffusion
A Tutorial on Extracting Formants in Praat
phihung/ipyform — Extension to render Google Colab Form on regular Jupyter Notebooks
autc04/executor — A modern fork of the classic Mac emulator
PureSwift/Cacao — Pure Swift Cross-platform UIKit (Cocoa Touch) implementation (Supports Linux)
cjwl/cocotron — The Cocotron.
darlinghq/darling — Darwin/macOS emulation layer for Linux
vosen/ZLUDA — CUDA on non-NVIDIA GPUs
How to Learn ANYTHING Faster Than Everyone
qemu-bsd-user/qemu-bsd-user — qemu bsd userland on Linux
hubertsiuzdak/snac — Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
MiSTer-devel/Amstrad-PCW_MiSTer — Amstrad PCW MiSTer core
sorgelig/ZX_Spectrum-128K_MIST
CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing, code
rtrussell/BBCSDL — BBC BASIC for SDL 2.0: for Windows, Linux (86), MacOS, Raspberry Pi, Android and iOS.
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
By all means, tread on those people
Linux-RISC/Reanimator — Reanimator allows Silicon Graphics IRIX network installation using a Raspberry Pi or VirtualBox
momo5502/sogen — 🪅 Windows User Space Emulator
unicorn-engine/unicorn — Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
source-solutions/sebasic4 — SE BASIC - A free BASIC interpreter written in Z80 assembly language
Introduction to Computational Graphs
kyutai-labs/unmute — Make text LLMs listen and speak
Atcold/Energy-Book — Deep Learning, an Energy Approach
fluxions-ai/vui — Small 100M Conversational speech models that can run on device