Interesting links, 16/08/2023
Misc. interesting things.
@misc{xue2023tranusr,
title={TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition},
author={Hongfei Xue and Qijie Shao and Peikun Chen and Pengcheng Guo and Lei Xie and Jie Liu},
year={2023},
eprint={2305.13629},
}
@misc{chemudupati2023transferability,
title={On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications},
author={Vamsikrishna Chemudupati and Marzieh Tahaei and Heitor Guimaraes and Arthur Pimentel and Anderson Avila and Mehdi Rezagholizadeh and Boxing Chen and Tiago Falk},
year={2023},
eprint={2305.14546},
}
CASA-ASR: Context-Aware Speaker-Attributed ASR
@misc{shi2023casaasr,
title={CASA-ASR: Context-Aware Speaker-Attributed ASR},
author={Mohan Shi and Zhihao Du and Qian Chen and Fan Yu and Yangze Li and Shiliang Zhang and Jie Zhang and Li-Rong Dai},
year={2023},
eprint={2305.12459},
}
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training, samples
@misc{ye2023clapspeech,
title={CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training},
author={Zhenhui Ye and Rongjie Huang and Yi Ren and Ziyue Jiang and Jinglin Liu and Jinzheng He and Xiang Yin and Zhou Zhao},
year={2023},
eprint={2305.10763},
}
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
@misc{rekesh2023fast,
title={Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition},
author={Dima Rekesh and Samuel Kriman and Somshubra Majumdar and Vahid Noroozi and He Huang and Oleksii Hrinchuk and Ankur Kumar and Boris Ginsburg},
year={2023},
eprint={2305.05084},
}
The HARPY Speech Recognition System
Multi-Task and Transfer Learning in Low-Resource Speech Recognition
PaLI: Scaling Language-Image Learning in 100+ Languages
google/matcha-chart2text-pew — This model is the MatCha model, fine-tuned on Chart2text-pew dataset. This fine-tuned checkpoint might be better suited for chart summarization task.
google/matcha-plotqa-v2 — This model is the MatCha model, fine-tuned on plotQA-v2 dataset. This fine-tuned checkpoint might be better suited for plots question answering tasks.
FLEURS Irish — all non-native, from what I’ve checked.
budzianowski/multiwoz — Source code for end-to-end dialogue model from the MultiWOZ paper
salesforce/DialogStudio — DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
Implementation of the Branchformer
facebookresearch/Ego4d — Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset. (Data has an awful licence).
@misc{han2020contextnet,
title={ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context},
author={Wei Han and Zhengdong Zhang and Yu Zhang and Jiahui Yu and Chung-Cheng Chiu and James Qin and Anmol Gulati and Ruoming Pang and Yonghui Wu},
year={2020},
eprint={2005.03191},
}
JOIST: A Joint Speech and Text Streaming Model For ASR
@misc{sainath2022joist,
title={JOIST: A Joint Speech and Text Streaming Model For ASR},
author={Tara N. Sainath and Rohit Prabhavalkar and Ankur Bapna and Yu Zhang and Zhouyuan Huo and Zhehuai Chen and Bo Li and Weiran Wang and Trevor Strohman},
year={2022},
eprint={2210.07353},
}
Improving Joint Speech-Text Representations Without Alignment
@misc{peyser2023improving,
title={Improving Joint Speech-Text Representations Without Alignment},
author={Cal Peyser and Zhong Meng and Ke Hu and Rohit Prabhavalkar and Andrew Rosenberg and Tara N. Sainath and Michael Picheny and Kyunghyun Cho},
year={2023},
eprint={2308.06125},
}