Disordered speech

Misc

Not long ago, @nickfloats introduced a #prompt structure for creating 360° images. Today, I've decided to show you another equally effective structure. Follow this thread for a complete workflow! 🔥🔥 pic.twitter.com/etrr5izMuG
— Pierrick Chevallier | IA (@CharaspowerAI) January 24, 2024

Are you using models to study human speech processing? 🗣️🤖🧠👶

Consider submitting your work to our special session "Computational models of human language acquisition, perception, and production" at @ISCAInterspeech 2024

organized by @ojrasanen,@thomashueber, and myself!
— Marvin Lavechin (@LavechinMarvin) January 24, 2024

Averaging Weights Leads to Wider Optima and Better Generalization

DreaMoving: A Human Video Generation Framework based on Diffusion Models, no code yet

VCoder: Versatile Vision Encoders for Multimodal Large Language Models, code

m-bain/whisperX — WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

roboflow/supervision

FaceStudio: Put Your Face Everywhere in Seconds, no code yet

3D-GPT: Procedural 3D Modeling with Large Language Models, no code yet

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, no code yet

GAIA: Zero-shot Talking Avatar Generation

Speaker and Language Change Detection using Wav2vec2 and Whisper

DmitryRyumin/INTERSPEECH-2023-Papers

wenet-e2e/wespeaker — Research and Production Oriented Speaker Recognition Toolkit

alibaba-damo-academy/3D-Speaker — A repository for single- and multi-modal speaker verification, speaker recognition and speaker diarization.

vjeronymo2/mColBERT

Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers

lucidrains/voicebox-pytorch — Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch