Interesting links, 25/11/2021
Misc. interesting things.
ELS-RD/transformer-deploy — Deploy optimized transformer based models in production
Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers, fairseq, Facebook AI blog
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
jusText 3 — jusText is a tool for removing boilerplate content
Onion — onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts.
rsling/texrex — texrex web page cleaning & ClaraX random walk crawler
Representation Learning with Contrastive Predictive Coding, facebookresearch/CPC_audio
menelik3/cmudict-ipa — The CMU Pronouncing Dictionary converted to IPA
A cross-linguistic database of phonetic transcription systems
glottobank/potential-of-cognate-detection — Source code and data accompanying the paper “The Potential of Automatic Word Comparison for Historical Linguistics”
glottobank/tukano — Repository for computer-guided reconstruction with Jena wordlist standard for Tukano language data
flashlight/flashlight/app/asr/tools/alignment
wav2letter/recipes/lexicon_free
CMU Advanced NLP 2021 Prompting + Sequence-to-sequence Pre-training
ming024/FastSpeech2 — An implementation of Microsoft’s “FastSpeech 2: Fast and High-Quality End-to-End Text to Speech”
[Phrase Retrieval and Beyond | Princeton NLP Group](https://princeton-nlp.github.io/phrase-retrieval-and-beyond/) |
princeton-nlp/PURE A Frustratingly Easy Approach for Entity and Relation Extraction
princeton-nlp/LM-BFF LM-BFF. Better Few-shot Fine-tuning of Language Models
camelot-dev/camelot — A Python library to extract tabular data from PDFs
neural-network-and-data-loading.ipynb
jina-ai/finetuner — Finetuning any DNN for better embedding on neural search tasks
jina-ai/jina — Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data
nnmnkwii_gallery/01-DNN-based statistical speech synthesis (en).ipynb
Character-level Convolutional Networks for Text Classification
Todo
Die araner mundart/Wörterbuch/æ ȧ – Wikisource
L’Accent dans le gaëlique du Munster - Wikisource
patrickvonplaten/Wav2Vec2_PyCTCDecode
kensho-technologies/pyctcdecode
kaldi/run_segmentation_long_utts.sh
kaldi/egs/wsj/s5/steps/cleanup
kaldi/clean_and_segment_data.sh
[OSCAR 21.09 | OSCAR](https://oscar-corpus.com/post/oscar-v21-09/) |