Disordered speech

Hybrid CNN-LSTM model with efficient hyperparameter tuning for prediction of Parkinson’s disease

A machine learning method to process voice samples for identification of Parkinson’s disease

Analysis of Parkinson’s Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods

Improving Parkinson’s disease recognition through voice analysis using deep learning

Misc

Averaging Weights Leads to Wider Optima and Better Generalization

DreaMoving: A Human Video Generation Framework based on Diffusion Models, no code yet

VCoder: Versatile Vision Encoders for Multimodal Large Language Models, code

m-bain/whisperX — WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

roboflow/supervision

FaceStudio: Put Your Face Everywhere in Seconds, no code yet

3D-GPT: Procedural 3D Modeling with Large Language Models, no code yet

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis, no code yet

GAIA: Zero-shot Talking Avatar Generation

Speaker and Language Change Detection using Wav2vec2 and Whisper

DmitryRyumin/INTERSPEECH-2023-Papers

wenet-e2e/wespeaker — Research and Production Oriented Speaker Recognition Toolkit

alibaba-damo-academy/3D-Speaker — A repository for single- and multi-modal speaker verification, speaker recognition and speaker diarization.

vjeronymo2/mColBERT

Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers

lucidrains/voicebox-pytorch — Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch