Interesting links, 12/08/2025

langextract — A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

serengil/deepface — A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Reinforcement Learning: An Overview

liquidslr/leetcode-company-wise-problems — Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode company tags. Updated as of 20 June, 2025

liquidslr/system-design-notes — Notes of the book System Design Interview - An Insider’s Guide

Fine-Tuning SAM 2 on a Custom Dataset, notebook

An acoustic analysis of sentence level prominence in Pakistani English speech

Whisper for L2 speech scoring

Optimizing Whisper models for Amazigh ASR: a comparative analysis

Def2Vec: you shall know a word by its definition

Evaluation of phone posterior probabilities for pathology detection in speech data using deep learning models

Correction: Automatic hate speech detection in audio using machine learning algorithms

Parkinson’s disease detection from speech using combination of empirical wavelet transform and Hilbert transform

Hybrid RMDL-CNN for speech recognition from unclear speech signal

bytedance-seed/BAGEL, model

Continuous Speech Tokenizer in Text To Speech

International Journal of Speech Technology - articles

Journal of Speech, Language, and Hearing Research

Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) ** | 300 hours Switchboard | Libri-Light + CommonVoice + Switchboard + Fisher | download

SAM2 fine tuning

black-forest-labs/flux — The schnell model is open.

Unsloth Whisper notebook

jermp/tongrams — A C++ library providing fast language model queries in compressed space.

Monitor Polski Nr. 14

Jak zapisywać liczby w tekstach – podsumowanie

Odmiana liczebników

Jak poprawnie zapisywać daty?

WavChat: A Survey of Spoken Dialogue Models

Qwen/Qwen-Image

IRSvideosASL

The 2025 LRAC Challenge

Razer Blade 15” (2020) Charging Port Replacement

Public defence of doctoral thesis

ByteDance-Seed/Seed-OSS-36B-Instruct

Enhancing In-the-Wild Speech Emotion Conversion with Resynthesis-based Duration Modeling

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style

Pretrained Conformers for Audio Fingerprinting and Retrieval, notebook

RF5/simple-speaker-embedding — A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.

Tasks.md

Qwen2-Audio: Chat with Your Voice

SpeechBrain LibriSpeech G2P

Librispeech Alignments

flexthink/librig2p-nostress-space-cmu

Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework, Kaldi script directory

Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment

@inproceedings{rousso24_interspeech,
  title     = {Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment},
  author    = {Rotem Rousso and Eyal Cohen and Joseph Keshet and Eleanor Chodroff},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {1525--1529},
  doi       = {10.21437/Interspeech.2024-429},
  issn      = {2958-1796},
}

Orange-OpenSource/conllueditor

rapidfuzz/RapidFuzz

Joyce — Amstrad PCW emulator

sphinx4-clarinpl

Docker Containers on the Desktop

fadams/docker-gui — The code repository for a book providing a detailed step-by-step guide to packaging and running GUI applications as Docker containers.

Accelerating Neural Network Training with Semi-Structured Sparsity

microsoft/Phi-4-multimodal-instruct, sample_finetune_speech.py

seastar105/Phi-4-mm-inst-zeroth-kor — Korean fine tune script

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training, demo

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention, code, demo, space

The Windows Subsystem for Linux Is Now Open Source

microsoft/WSL, microsoft/WSL2-Linux-Kernel

Microsoft Open Sources OpenHCL, a Linux-Based ‘Paravisor’

ZeroSep: Separate Anything in Audio with Zero Training, code

rllm-org/rllm — Democratizing Reinforcement Learning for LLMs

agentica-org/DeepCoder-14B-Preview

ByteDance-Seed/Seed-OSS-36B-Instruct

Google’s language resources repo has a dockerfile for merline here, this is available as langtech/base-merlin:v1_1

Terrible things happen in life – but it is possible to recover from them

Lifting Motion to the 3D World via 2D Diffusion

Chatterbox TTS

A Tutorial on Extracting Formants in Praat

Tesseract for Hawaiian

Prosodic characteristics of deceptive picture descriptions in Finnish: Acoustics, beliefs, self-evaluations, and deception theories

phihung/ipyform — Extension to render Google Colab Form on regular Jupyter Notebooks

autc04/executor — A modern fork of the classic Mac emulator

PureSwift/Cacao — Pure Swift Cross-platform UIKit (Cocoa Touch) implementation (Supports Linux)

cjwl/cocotron — The Cocotron.

darlinghq/darling — Darwin/macOS emulation layer for Linux

vosen/ZLUDA — CUDA on non-NVIDIA GPUs

The jank programming language

How to Learn ANYTHING Faster Than Everyone

Sapling

Mosh: the mobile shell

qemu-bsd-user/qemu-bsd-user — qemu bsd userland on Linux

hubertsiuzdak/snac — Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

MiSTer-devel/Amstrad-PCW_MiSTer — Amstrad PCW MiSTer core

sorgelig/ZX_Spectrum-128K_MIST

EffortNet: A Deep Learning Framework for Objective Assessment of Speech Enhancement Technologies Using EEG-Based Alpha Oscillations

CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing, code

rtrussell/BBCSDL — BBC BASIC for SDL 2.0: for Windows, Linux (86), MacOS, Raspberry Pi, Android and iOS.

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder

By all means, tread on those people

Linux-RISC/Reanimator — Reanimator allows Silicon Graphics IRIX network installation using a Raspberry Pi or VirtualBox

momo5502/sogen — 🪅 Windows User Space Emulator

unicorn-engine/unicorn — Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)

source-solutions/sebasic4 — SE BASIC - A free BASIC interpreter written in Z80 assembly language

MACE

Introduction to Computational Graphs

kyutai-labs/unmute — Make text LLMs listen and speak

Atcold/Energy-Book — Deep Learning, an Energy Approach

fluxions-ai/vui — Small 100M Conversational speech models that can run on device

Interactive Linear Algebra