Interesting links, 03/07/2023
Misc. interesting things.
Affordances from Human Videos as a Versatile Representation for Robotics, project page
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding
@misc{wang2023speechtotext,
title={Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding},
author={Mingqiu Wang and Izhak Shafran and Hagen Soltau and Wei Han and Yuan Cao and Dian Yu and Laurent El Shafey},
year={2023},
eprint={2306.07944},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
1adrianb/2D-and-3D-face-alignment
Self-Supervised Accent Learning for Under-Resourced Accents Using Native Language Data
@INPROCEEDINGS{10096854,
author={Kumar, Mehul and Kim, Jiyeon and Gowda, Dhananjaya and Garg, Abhinav and Kim, Chanwoo},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Self-Supervised Accent Learning for Under-Resourced Accents Using Native Language Data},
year={2023},
volume={},
number={},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10096854}
}
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations, project page
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
McGill-NLP/length-generalization
google-research/fast-soft-sort
jbeskow/tuben – Tube model of vocal tract - resonance frequency estimation
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors, project page, code (currently empty)
Guy who bought $37k in stolen human organs literally put "braiiiiins." in the memo line on PayPal. https://t.co/Lz6uE8VfST pic.twitter.com/llLTT70k6n
— Nick Bax.eth (@bax1337) June 14, 2023