Interesting links, 15/12/2023

facebookincubator/AITemplate — AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations Horrible licence.

rasbt/LLMs-from-scratch

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

BrainGPT - A step towards the future of human-AI merger

BrainGPT is capable of thought-to-text translation and connects a multitask EEG encoder with LMS to decode coherent and readable sentences from EEG signals.

This means that thoughts, measured by wearing a cap with… pic.twitter.com/2Vp58Uev3L
— Bindu Reddy (@bindureddy) December 17, 2023

dylanebert/gsplat.js

migtissera/Tess-Coder-v1.0

google-research/robotics_transformer

huggingface/trl

UniversalDependencies/UD_Swedish_Sign_Language-SSLC

Numbers from 100-1 Million

Uimhreacha

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows.

E.g. we… pic.twitter.com/vj10nZEXlH
— Andrej Karpathy (@karpathy) April 9, 2023

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

OpenNMT/CTranslate2

Nile Making Things That Gods Detest Documentary

GestureDiffuCLIP demo

FLAP: Fast Language-Audio Pre-training

Add Bayes Risk CTC

pnnl/HyperNetX — Python package for hypergraph analysis and visualization.

Finite state transducers and Pynini

cobcom/wlalign — An implementation of the WL-align algorithm, a graph-alignment routine based on the generalization of the Weisfeiler-Lehman algorithm.

eth-sri/astarix — AStarix: Fast and Optimal Sequence-to-Graph Aligner

OpenScene: 3D Scene Understanding with Open Vocabularies, demo, code

I’ve resigned from my role leading the Audio team at Stability AI, because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use’.

First off, I want to say that there are lots of people at Stability who are deeply…
— Ed Newton-Rex (@ednewtonrex) November 15, 2023

Mixtral of Experts

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

MahmoudAshraf97/whisper-diarization

drlukeparry/pyslm — PySLM: A Python Library for 3D Printing and Additive Manufacturing

3D Printing for Vocal-Tract Models

open-mmlab/mmpose

ochen1/insanely-fast-whisper-cli — The fastest Whisper optimization for automatic speech recognition as a command-line interface

geekodour/wscribe — ez audio transcription tool with flexible processing and post-processing options

SYSTRAN/fuzzy-match — Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.

SYSTRAN/similarity — Bilingual sentence similarity classifier using Tensorflow