Dao-AILab/flash-attention — Fast and memory-efficient exact attention

facebookincubator/velox — A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

How 🤗 Accelerate runs very large models thanks to PyTorch

A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform

karkirowle/relative_phoneme_analysis — Repository for phoneme analysis on word-level Kaldi/ESPNet ASR transcripts

Irish Gaelic/seanchló print

prajdabre/yanmtt — Yet Another Neural Machine Translation Toolkit

google-research-datasets/TextNormalizationCoveringGrammars — Covering grammars for English and Russian text normalization

Language Model Inversion

WavJourney: Compositional Audio Creation with Large Language Models

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

thuhcsi/VAENAR-TTS — The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Automatic Generation of Subtitles for Videos of the Government of La Rioja

The Properly Illustrated Transformer

The Illustrated Transformer

The Annotated Transformer

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Instant3D: Instant Text-to-3D Generation

LRM: Large Reconstruction Model for Single Image to 3D

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

‘Dair’ (Live @ Urban Assault 2018)

Train T5 Model From Scratch

lingjzhu/CharsiuG2P — Multilingual G2P in 100 languages

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels, “code”

Speculative Decoding for 2x Faster Whisper Inference

SHI-Labs/VCoder — VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023

ConvNets Match Vision Transformers at Scale

SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT

An Introduction to Transformers

MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones

Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer

Training Distil-Whisper

wellecks/ntptutorial — Tutorial on neural theorem proving

THE LITTLE BOOK OF DEEP LEARNING

open-mmlab/Amphion — Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

yangdongchao/AcademiCodec

SpeechAct: Towards Generating Whole-body Motion from Speech

Fine-tuning Whisper for Dutch Language: The Crucial Role of Size

Introduction to Speech Processing

LibriSpeech Alignments

OML-Team/open-metric-learning — Library for metric learning pipelines and models.

haotian-liu/LLaVA — [NeurIPS’23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Flip - Glass Animals

Simplifying Transformer Blocks

Advanced RAG Techniques: an Illustrated Overview

nvidia/parakeet-rnnt-1.1b

bclavie/RAGatouille

colbert-ir/colbertv2.0

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models

LLM Augmented LLMs: Expanding Capabilities through Composition

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

This AI Paper from Meta Introduces Hyper-VolTran: A Novel Neural Network for Transformative 3D Reconstruction and Rendering, paper

Phi-2: The surprising power of small language models

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

MotionScript: Natural Language Descriptions for Expressive 3D Human Motions

pjyazdian/Gesture2Vec — This is an official PyTorch implementation of “Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation” (IROS 2022).

neuromorphs/NIR — Neuromorphic Intermediate Representation reference implementation

Better Explained

PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

100 tiny changes to transform your life: from the one-minute rule to pyjama yoga

This Paper from MIT and Microsoft Introduces ‘LASER’: A Novel Machine Learning Approach that can Simultaneously Enhance an LLM’s Task Performance and Reduce its Size with no Additional Training

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

LiteLlama: Reduced-Scale Llama — We present an open-source reproduction of Meta AI’s LLaMa 2. However, with significantly reduced model sizes, LiteLlama-460M-1T has 460M parameters trained with 1T tokens.

Token 1.3: What is Retrieval-Augmented Generation (RAG)?

VikParuchuri/surya — Accurate line-level text detection and recognition (OCR) in any language

gchrupala/neurospoken — Neural models of spoken language - LOT Winter school 2024

My AI Timelines Have Sped Up (Again)

NeMo - ASR with Transducers

Mixtral 8x7B is currently the best open-source LLM, surpassing GPT-3.5

Foundations of Vector Retrieval

GARField: Group Anything with Radiance Fields

AlphaGeometry: An Olympiad-level AI system for geometry, code

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Tuning Language Models by Proxy