Interesting links, 16/06/2025

Misc. interesting things.

Jun 16, 2025 • 3 min read

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages, code

SLURP: A Spoken Language Understanding Resource Package, data — text is open, audio is not

ZeroSep: Separate Anything in Audio with Zero Training, arXiv, code

There are 6 forms of depression, study shows. Here’s how they’re different.

Marconi Union - Weightless — supposed to help with anxiety

Translation-Inspired OCR

Gramatika kaszëbsczégò jãzëka

Kashubian through Polish

Bajki Kaszubkie

Słownik Polsko-Kaszubski

Dependency Parsing Evaluation for Low-resource Spontaneous Speech

The Swedish Parliament Corpus

Upper Sorbian UD

Insert OCRed text and annotations in DjVu

TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks, code

Anthropic wins key US ruling on AI training in authors’ copyright lawsuit

CILI: the Collaborative Interlingual Index

omwn/omw-data — This packages up data for the Open Multilingual Wordnet

docker image save — Save one or more images to a tar archive

Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning

facebook/vjepa2-vitl-fpc64-256 — actually open source.

October 13-15, 2025
deadline: June 30, 2025 (23:59, anywhere on Earth)
overleaf
EasyChair

Dev Containers tutorial

Terrible things happen in life – but it is possible to recover from them

InteractAnything: Zero-shot Human Object-Interaction Synthesis via LLM Feedback and Object Affordance Parsing, arxiv

szgabsz91/jdk-ocamorph-pyphen

Magyar népmesék sorozat, Hungarian Folk Tales

kornai/MoLHandbook

Universal Dependencies

Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm

allenai/Molmo-7B-D-0924

Conference: October 19-22, 2025
Paper submission (5 – 6 pages, IEEE format): July 7, 2025
OpenReview
Overleaf
5-6 pages including references

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

huspacy/huspacy

Neural Network Activation Functions pic.twitter.com/WYKOER1Ldz
— Dan Kornas (@DanKornas) July 2, 2025

kyutai/tts-1.6b-en_fr,

The single biggest argument about statistics: is probability frequentist or Bayesian?

It's neither, and I'll explain why.

Buckle up. Deep-dive explanation incoming. pic.twitter.com/PYlvOAGyB6
— Tivadar Danka (@TivadarDanka) July 3, 2025

timtadh/zhang-shasha — Tree edit distance using the Zhang Shasha algorithm

The Unbelievable Truth

Tool - Back to the beginning

The ParlaMint corpora of parliamentary proceedings

Spoken Spanish PoS tagging: gold standard dataset

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark, code, dataset

Self-supervised learning of speech representations with Dutch archival data

hitachi-speech/EEND — EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

Add option to carry initial_prompt with the sliding window

myshell-ai/MeloTTS — High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Self-supervised Learning (SSL) vs Contrastive Language-Image (CLIP) models is a never-ending battle

What about using both? This Google paper does exactly that, and results are really good on many different tasks

TIPS is a model trained with a CLIP loss and 2 SSL losses [1/9] pic.twitter.com/4tKDfM152W
— Gabriele Berton (@gabriberton) July 4, 2025

TIPS: Text-Image Pretraining with Spatial awareness, code

Binary Latent Diffusion, ZeWang95/BinaryLatentDiffusion

espnet/owsm_ctc_v4_1B

Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping

An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems, models

The SIWIS French Speech Synthesis Database

Towards Distributed Neural Architectures

kb-labb/post-ocr-correction

https://github.com/openai/whisper/commit/31243bad24cc746f07d4c8bfdd2d974872cb1803 — Add option to carry initial_prompt with the sliding window

Voice Conversion With Just Nearest Neighbors, code

atong01/conditional-flow-matching — TorchCFM: a Conditional Flow Matching library

THUDM/GLM-4-9B-0414

einspace: Searching for Neural Architectures from Fundamental Operations

cadia-lvl/althingi-asr

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Flow Matching Guide and Code

Voxtral, mistralai/Voxtral-Mini-3B-2507

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training, code

Real-Time Textless Dialogue Generation

Prosody Labeling with Phoneme-BERT and Speech Foundation Models

facebookresearch/fairchem — FAIR Chemistry’s library of machine learning methods for chemistry

clement-pages/gryannote — Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.