Interesting links, 15/12/2023
Misc. interesting things.
A Cookbook of Self-Supervised Learning
Automatic Quality Estimation for ASR System Combination
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Attention as a guide for Simultaneous Speech Translation
accent in CUBE — Current British English
Morse wavelet transform-based features for voice liveness detection
Towards inclusive automatic speech recognition
colmap/colmap — COLMAP - Structure-from-Motion and Multi-View Stereo
SuLvXiangXin/zipnerf-pytorch — Unofficial implementation of ZipNeRF
Introduction to 3D Gaussian Splatting
How 🤗 Accelerate runs very large models thanks to PyTorch
google-deepmind/graphcast — GraphCast: Learning skillful medium-range global weather forecasting
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
FasterDecoding/REST — Retrieval-Based Speculative Decoding
HazyResearch/hyena-dna — Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
Vaibhavs10/insanely-fast-whisper
Italian Parkinson’s voice and speech
facebookincubator/AITemplate — AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations Horrible licence.
BrainGPT - A step towards the future of human-AI merger
— Bindu Reddy (@bindureddy) December 17, 2023
BrainGPT is capable of thought-to-text translation and connects a multitask EEG encoder with LMS to decode coherent and readable sentences from EEG signals.
This means that thoughts, measured by wearing a cap with… pic.twitter.com/2Vp58Uev3L
google-research/robotics_transformer
UniversalDependencies/UD_Swedish_Sign_Language-SSLC
This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows.
— Andrej Karpathy (@karpathy) April 9, 2023
E.g. we… pic.twitter.com/vj10nZEXlH
Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
Nile Making Things That Gods Detest Documentary
FLAP: Fast Language-Audio Pre-training
pnnl/HyperNetX — Python package for hypergraph analysis and visualization.
Finite state transducers and Pynini
cobcom/wlalign — An implementation of the WL-align algorithm, a graph-alignment routine based on the generalization of the Weisfeiler-Lehman algorithm.
eth-sri/astarix — AStarix: Fast and Optimal Sequence-to-Graph Aligner
OpenScene: 3D Scene Understanding with Open Vocabularies, demo, code
I’ve resigned from my role leading the Audio team at Stability AI, because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’.
— Ed Newton-Rex (@ednewtonrex) November 15, 2023
First off, I want to say that there are lots of people at Stability who are deeply…
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
MahmoudAshraf97/whisper-diarization
drlukeparry/pyslm — PySLM: A Python Library for 3D Printing and Additive Manufacturing
3D Printing for Vocal-Tract Models
ochen1/insanely-fast-whisper-cli — The fastest Whisper optimization for automatic speech recognition as a command-line interface
geekodour/wscribe — ez audio transcription tool with flexible processing and post-processing options
SYSTRAN/fuzzy-match — Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.
SYSTRAN/similarity — Bilingual sentence similarity classifier using Tensorflow