A Cookbook of Self-Supervised Learning

Automatic Quality Estimation for ASR System Combination

Choose Your Weapon: Survival Strategies for Depressed AI Academics

ggerganov/whisper.cpp

Attention as a guide for Simultaneous Speech Translation

accent in CUBE — Current British English

Morse wavelet transform-based features for voice liveness detection

Towards inclusive automatic speech recognition

colmap/colmap — COLMAP - Structure-from-Motion and Multi-View Stereo

SuLvXiangXin/zipnerf-pytorch — Unofficial implementation of ZipNeRF

Tanks and Temples

Introduction to 3D Gaussian Splatting

LLM Visualization

How 🤗 Accelerate runs very large models thanks to PyTorch

Cló Gaelach agus Transkribus

google-deepmind/graphcast — GraphCast: Learning skillful medium-range global weather forecasting

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

FasterDecoding/REST — Retrieval-Based Speculative Decoding

HyenaDNA Models

HazyResearch/hyena-dna — Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Vaibhavs10/insanely-fast-whisper

Instruction Tuning Vol. 1

Italian Parkinson’s voice and speech

neonbjb/tortoise-tts

NVIDIA/TensorRT-LLM

facebookincubator/AITemplate — AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations Horrible licence.

rasbt/LLMs-from-scratch

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

dylanebert/gsplat.js

migtissera/Tess-Coder-v1.0

google-research/robotics_transformer

huggingface/trl

UniversalDependencies/UD_Swedish_Sign_Language-SSLC

Numbers from 100-1 Million

Uimhreacha

Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

OpenNMT/CTranslate2

Nile Making Things That Gods Detest Documentary

GestureDiffuCLIP demo

FLAP: Fast Language-Audio Pre-training

Add Bayes Risk CTC

pnnl/HyperNetX — Python package for hypergraph analysis and visualization.

Finite state transducers and Pynini

cobcom/wlalign — An implementation of the WL-align algorithm, a graph-alignment routine based on the generalization of the Weisfeiler-Lehman algorithm.

eth-sri/astarix — AStarix: Fast and Optimal Sequence-to-Graph Aligner

OpenScene: 3D Scene Understanding with Open Vocabularies, demo, code

Mixtral of Experts

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

MahmoudAshraf97/whisper-diarization

drlukeparry/pyslm — PySLM: A Python Library for 3D Printing and Additive Manufacturing

3D Printing for Vocal-Tract Models

open-mmlab/mmpose

ochen1/insanely-fast-whisper-cli — The fastest Whisper optimization for automatic speech recognition as a command-line interface

geekodour/wscribe — ez audio transcription tool with flexible processing and post-processing options

SYSTRAN/fuzzy-match — Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.

SYSTRAN/similarity — Bilingual sentence similarity classifier using Tensorflow