Interesting links, 02/03/2022
Misc. interesting things.
TorchStudio Features — Looks interesting, doesn’t seem to run on ARM Mac though
Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning
The Effects of Automatic Speech Recognition Quality on Human Transcription Latency “Our studies with 160 participants recruited on Amazon’s Mechanical Turk indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate (WER) is under 30%)”
Differentiable Allophone Graphs for Language-Universal Speech Recognition, tweet
Boosting Wav2Vec2 with n-grams in 🤗 Transformers
Neural Instrument Cloning from very few samples
chinedufn/swift-bridge — swift-bridge facilitates Rust and Swift interop.
qarmin/czkawka — Multi functional app to find duplicates, empty folders, similar images etc.
PyO3/pyo3 — Rust bindings for the Python interpreter
N-gram Language Model with NLTK
speechbrain.lm.counting module
NbAiLab/NPSC — Norwegian Parliamentary Speech Corpus
FNet: Mixing Tokens with Fourier Transforms, code, HF
HF: wav2vec update for tiny audio
Adding vs. concatenating positional embeddings & Learned positional encodings
Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, OpenReview, code
02 – Neural nets: rotation and squashing
Audio augmentation
asteroid-team/torch-audiomentations
Spijkervet/torchaudio-augmentations
Fonetik
FM-modulation unit for tape-recording
Voice fundamental frequency tracking
Detection of voicing and Automatic segmentation schemes
Evaluation of spectrographic data sampling techniques
Structural classification of Swedish phonemes