Interesting links, 19/09/2025
Misc. interesting things.
A Visual Guide to Quantization
NetEase-FuXi/EETQ — Easy and Efficient Quantization for Transformers
mmBERT: ModernBERT goes Multilingual
PolyIPA – Multilingual Phoneme-to-Grapheme Conversion Model
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
@inproceedings{yoon23d_interspeech,
title = {Mitigating the Exposure Bias in Sentence-Level {G}rapheme-to-{P}honeme ({G2P}) Transduction},
author = {Eunseop Yoon and Hee Suk Yoon and Dhananjaya Gowda and SooHwan Eom and Daehyeok Kim and John Harvill and Heting Gao and Mark Hasegawa-Johnson and Chanwoo Kim and Chang D. Yoo},
year = {2023},
booktitle = {Interspeech 2023},
pages = {2028--2032},
doi = {10.21437/Interspeech.2023-2336},
issn = {2958-1796},
}
T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion
@inproceedings{rezackova21_interspeech,
title = {T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion},
author = {Markéta Řezáčková and Jan Švec and Daniel Tihelka},
year = {2021},
booktitle = {Interspeech 2021},
pages = {6--10},
doi = {10.21437/Interspeech.2021-546},
issn = {2958-1796},
}
T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion
@ARTICLE{10592637,
author={Řezáčková, Markéta and Tihelka, Daniel and Matoušek, Jindřich},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion},
year={2024},
volume={32},
number={},
pages={3466-3476},
doi={10.1109/TASLP.2024.3426332}
}
NeMo Grapheme-to-Phoneme Models
The Oxford-BBC Lip Reading Sentences 2
1adrianb/face-alignment — 2D and 3D Face alignment library build using pytorch
MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes, code
ByteDance-Seed/Seed-OSS-36B-Instruct
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors, code
OmniGen2: Exploration to Advanced Multimodal Generation, code, model
Kyutai STT, kyutai/stt-1b-en_fr, kyutai/tts-1.6b-en_fr, code, kyutai/mimi
The State of Freedom Around the World in 2025
baidu/ERNIE-4.5-300B-A47B-Base-PT
Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task #6194
Add Harvest algorithm as an option for F0 extraction #6083
Create a ESPnet bootcamp recipe for proyecto-nahuatl-asr #6066
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis, code
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis, code
Qwen3-Omni: Natively Omni-Modal Foundation Models!, code, collection, Captioner, demo