Interesting links, 20/02/2024
Misc. interesting things.
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
karpathy/minbpe — Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MAGVIT: Masked Generative Video Transformer, code
DiffiT: Diffusion Vision Transformers for Image Generation
A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces
How to Train Data-Efficient LLMs
Fine-tuning Large Language Models for Adaptive Machine Translation
Robust agents learn causal world models
open-mmlab/Amphion — Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
The effects of automatic speech recognition quality on human transcription latency — “We present results from 2 studies which indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate of under 30%).”
Lexicographical data/Statistics/Counts of various things by language
OpenAccess-AI-Collective/axolotl — Go ahead and axolotl questions
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
microsoft/torchscale — Foundation Architecture for (M)LLMs
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models, code