A Visual Guide to Quantization

NetEase-FuXi/EETQ — Easy and Efficient Quantization for Transformers

mmBERT: ModernBERT goes Multilingual

jhu-clsp/mmBERT

Finetuning ByT5 for GED

NeMo T5

spring-media/DeepPhonemizer

honzas83/t5s

PolyIPA – Multilingual Phoneme-to-Grapheme Conversion Model

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

@inproceedings{yoon23d_interspeech,
  title     = {Mitigating the Exposure Bias in Sentence-Level {G}rapheme-to-{P}honeme ({G2P}) Transduction},
  author    = {Eunseop Yoon and Hee Suk Yoon and Dhananjaya Gowda and SooHwan Eom and Daehyeok Kim and John Harvill and Heting Gao and Mark Hasegawa-Johnson and Chanwoo Kim and Chang D. Yoo},
  year      = {2023},
  booktitle = {Interspeech 2023},
  pages     = {2028--2032},
  doi       = {10.21437/Interspeech.2023-2336},
  issn      = {2958-1796},
}

T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion

@inproceedings{rezackova21_interspeech,
  title     = {T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion},
  author    = {Markéta Řezáčková and Jan Švec and Daniel Tihelka},
  year      = {2021},
  booktitle = {Interspeech 2021},
  pages     = {6--10},
  doi       = {10.21437/Interspeech.2021-546},
  issn      = {2958-1796},
}

T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion

@ARTICLE{10592637,
  author={Řezáčková, Markéta and Tihelka, Daniel and Matoušek, Jindřich},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion}, 
  year={2024},
  volume={32},
  number={},
  pages={3466-3476},
  doi={10.1109/TASLP.2024.3426332}
}

NeMo Grapheme-to-Phoneme Models

NeMo G2P YAML

The Oxford-BBC Lip Reading Sentences 2

1adrianb/face-alignment — 2D and 3D Face alignment library build using pytorch

MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes, code

yerfor/Real3DPortrait

sprakradet/swedia_test_set

ByteDance-Seed/Seed-OSS-36B-Instruct

Identity-Preserving Talking Face Generation with Landmark and Appearance Priors, code

OmniGen2: Exploration to Advanced Multimodal Generation, code, model

GrapheneOS

Playing with binary formats

Kyutai STT, kyutai/stt-1b-en_fr, kyutai/tts-1.6b-en_fr, code, kyutai/mimi

The State of Freedom Around the World in 2025

baidu/ERNIE-4.5-300B-A47B-Base-PT

Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task #6194

Add Harvest algorithm as an option for F0 extraction #6083

Create a ESPnet bootcamp recipe for proyecto-nahuatl-asr #6066

Fine-tune an image model

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis, code

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis, code

yenchenlin/nerf-pytorch

XTXMarkets/ternfs

Qwen3-Omni: Natively Omni-Modal Foundation Models!, code, collection, Captioner, demo