Interesting links, 7/10/2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation, code

@misc{yang2023uniaudio,
      title={UniAudio: An Audio Foundation Model Toward Universal Audio Generation}, 
      author={Dongchao Yang and Jinchuan Tian and Xu Tan and Rongjie Huang and Songxiang Liu and Xuankai Chang and Jiatong Shi and Sheng Zhao and Jiang Bian and Xixin Wu and Zhou Zhao and Helen Meng},
      year={2023},
      eprint={2310.00704},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

If you'd asked me a year ago, superposition would have been by far the reason I was most worried that mechanistic interpretability would hit a dead end.

I'm now very optimistic. I'd go as far as saying it's now primarily an engineering problem -- hard, but less fundamental risk. https://t.co/bhhHObbAOK
— Chris Olah (@ch402) October 5, 2023

It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

@misc{bertsch2023its,
      title={It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk}, 
      author={Amanda Bertsch and Alex Xie and Graham Neubig and Matthew R. Gormley},
      year={2023},
      eprint={2310.01387},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning, code

@misc{yang2023fasthubert,
      title={Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning}, 
      author={Guanrou Yang and Ziyang Ma and Zhisheng Zheng and Yakun Song and Zhikang Niu and Xie Chen},
      year={2023},
      eprint={2309.13860},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

SLM: Bridge the thin gap between speech and text foundation models

@misc{wang2023slm,
      title={SLM: Bridge the thin gap between speech and text foundation models}, 
      author={Mingqiu Wang and Wei Han and Izhak Shafran and Zelin Wu and Chung-Cheng Chiu and Yuan Cao and Yongqiang Wang and Nanxin Chen and Yu Zhang and Hagen Soltau and Paul Rubenstein and Lukas Zilka and Dian Yu and Zhong Meng and Golan Pundak and Nikhil Siddhartha and Johan Schalkwyk and Yonghui Wu},
      year={2023},
      eprint={2310.00230},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Joint Audio and Speech Understanding

@misc{gong2023joint,
      title={Joint Audio and Speech Understanding}, 
      author={Yuan Gong and Alexander H. Liu and Hongyin Luo and Leonid Karlinsky and James Glass},
      year={2023},
      eprint={2309.14405},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios

@misc{srivastava2023effuse,
      title={EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios}, 
      author={Tejes Srivastava and Jiatong Shi and William Chen and Shinji Watanabe},
      year={2023},
      eprint={2310.03938},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

A Token-Wise Beam Search Algorithm for RNN-T

@misc{keren2023tokenwise,
      title={A Token-Wise Beam Search Algorithm for RNN-T}, 
      author={Gil Keren},
      year={2023},
      eprint={2302.14357},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning, code

@misc{liao2023zeroshot,
      title={Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning}, 
      author={Feng-Ting Liao and Yung-Chieh Chan and Yi-Chang Chen and Chan-Jan Hsu and Da-shan Shiu},
      year={2023},
      eprint={2307.10274},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

faramuci, Mi az a faramuci?

seggrepacsi