MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages, code

SLURP: A Spoken Language Understanding Resource Package, data — text is open, audio is not

ZeroSep: Separate Anything in Audio with Zero Training, arXiv, code

There are 6 forms of depression, study shows. Here’s how they’re different.

Marconi Union - Weightless — supposed to help with anxiety

Translation-Inspired OCR


Gramatika kaszëbsczégò jãzëka

Kashubian through Polish

Najô Ùczba

Bajki Kaszubkie

Słownik Polsko-Kaszubski


Dependency Parsing Evaluation for Low-resource Spontaneous Speech

The Swedish Parliament Corpus

Upper Sorbian UD

Insert OCRed text and annotations in DjVu

TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks, code

Anthropic wins key US ruling on AI training in authors’ copyright lawsuit

CILI: the Collaborative Interlingual Index

omwn/omw-data — This packages up data for the Open Multilingual Wordnet

docker image save — Save one or more images to a tar archive

Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning

facebook/vjepa2-vitl-fpc64-256actually open source.

SPECOM 2025:

  • October 13-15, 2025
  • deadline: June 30, 2025 (23:59, anywhere on Earth)
  • overleaf
  • EasyChair

Dev Containers tutorial

Terrible things happen in life – but it is possible to recover from them

InteractAnything: Zero-shot Human Object-Interaction Synthesis via LLM Feedback and Object Affordance Parsing, arxiv

szgabsz91/jdk-ocamorph-pyphen

Sorbian course

Sorbian radio

hunpars

hunmorph

hunmorph-foma

Magyar népmesék sorozat, Hungarian Folk Tales

Eupisco 2025

JoFrhwld/FAVE

kornai/MoLHandbook

Universal Dependencies

Finnish-NLP/wav2vec2-xlsr-300m-finnish-lm

CREPE

allenai/Molmo-7B-D-0924

SpeD 2025

  • Conference: October 19-22, 2025
  • Paper submission (5 – 6 pages, IEEE format): July 7, 2025
  • OpenReview
  • Overleaf
  • 5-6 pages including references

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

huspacy/huspacy

kyutai/tts-1.6b-en_fr,

timtadh/zhang-shasha — Tree edit distance using the Zhang Shasha algorithm

The Unbelievable Truth

Tool - Back to the beginning

The ParlaMint corpora of parliamentary proceedings

Spoken Spanish PoS tagging: gold standard dataset

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark, code, dataset

Self-supervised learning of speech representations with Dutch archival data

hitachi-speech/EEND — EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

Add option to carry initial_prompt with the sliding window

myshell-ai/MeloTTS — High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

TIPS: Text-Image Pretraining with Spatial awareness, code

Binary Latent Diffusion, ZeWang95/BinaryLatentDiffusion

espnet/owsm_ctc_v4_1B

Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping

An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems, models

The SIWIS French Speech Synthesis Database

Towards Distributed Neural Architectures

kb-labb/post-ocr-correction

https://github.com/openai/whisper/commit/31243bad24cc746f07d4c8bfdd2d974872cb1803 — Add option to carry initial_prompt with the sliding window

Voice Conversion With Just Nearest Neighbors, code

atong01/conditional-flow-matching — TorchCFM: a Conditional Flow Matching library

THUDM/GLM-4-9B-0414

einspace: Searching for Neural Architectures from Fundamental Operations

cadia-lvl/althingi-asr

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Flow Matching Guide and Code

Voxtral, mistralai/Voxtral-Mini-3B-2507

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training, code

Real-Time Textless Dialogue Generation

Prosody Labeling with Phoneme-BERT and Speech Foundation Models