Affordances from Human Videos as a Versatile Representation for Robotics, project page

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

@misc{wang2023speechtotext,
      title={Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding}, 
      author={Mingqiu Wang and Izhak Shafran and Hagen Soltau and Wei Han and Yuan Cao and Dian Yu and Laurent El Shafey},
      year={2023},
      eprint={2306.07944},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

leimao/DeepLab-V3

1adrianb/2D-and-3D-face-alignment

budzianowski/multiwoz

giakou4/pyfeats

Self-Supervised Accent Learning for Under-Resourced Accents Using Native Language Data

@INPROCEEDINGS{10096854,
  author={Kumar, Mehul and Kim, Jiyeon and Gowda, Dhananjaya and Garg, Abhinav and Kim, Chanwoo},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Self-Supervised Accent Learning for Under-Resourced Accents Using Native Language Data}, 
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10096854}
}

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations, project page

facebookresearch/encodec

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

McGill-NLP/length-generalization

xiph/LPCNet

google-research/mozolm

google-research/fast-soft-sort

google-research/swirl-lm

googlecolab/colab-widgets

christos-c/bible-corpus

jbeskow/tuben – Tube model of vocal tract - resonance frequency estimation

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors, project page, code (currently empty)

facebookresearch/dino