TMUX — Sharing terminal between Users

tmux -S /tmp/socket
chmod 777 /tmp/socket
# other user does:
tmux -S /tmp/socket attach

Bosco visits the lions at Dublin zoo

DepthFM: Fast Monocular Depth Estimation with Flow Matching

SakanaAI/evolutionary-model-merge

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

yl4579/PL-BERT — Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

MagyarorszagON, BudapestEN, SzlovakiaBAN, PozsonyBAN

Why Tool’s Danny Carey Is Your Drummer’s Favorite Drummer

A Corpus-based Pronunciation Teaching Model: A Conceptual Paper

Nigerian English pronunciation preferences: A corpus-based survey of pronunciation variants

Survey of American pronunciation preferences – a preliminary report

A corpus-based study of English pronunciation variations

Voice Conversion With Just Nearest Neighbors, code, demo (Go to “Dog-person conversion”!)

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging, code

L9: Cepstral analysis

Pre-emphasis

Estimating the Airspeed Velocity of an Unladen Swallow

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts, code

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

abuccts/wikt2pron — Wikt2pron is a Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format.

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

Can You Really Run on Top of a Train, Like in the Movies?

Parler TTS, huggingface/parler-tts, huggingface/dataspeech, parler-tts/parler_tts_mini_v0.1

EgoPet: Egomotion and Interaction Data from an Animal’s Perspective

The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation

The London–Lund corpus of spoken English: Description and research

loeweX/Forward-Forward — Reimplementation of Geoffrey Hinton’s Forward-Forward Algorithm

alphacep/whisper-prompts

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Phoneme Recognition with Wav2Vec2

Generating Diverse and Natural 3D Human Motions from Text, code

gaELECTRA

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, code

Swedish Kelly-list

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

jbeskow/tuben — Tube model of vocal tract - resonance frequency estimation

KAN: Kolmogorov-Arnold Networks, code

Hallucination of Multimodal Large Language Models: A Survey

microsoft/MS-DOS

Global Normalization for Streaming Speech Recognition in a Modular Framework

google-research/skai — SKAI is a machine learning based tool for performing automatic building damage assessments on aerial imagery of disaster sites.

bigscience-workshop/petals — Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

High dimensional, tabular deep learning with an auxiliary knowledge graph

An Embodied Generalist Agent in 3D World, code

LERF: Language Embedded Radiance Fields, code

facebookresearch/Mask2Former

Jungjee/RawNet — This repository includes implementations of speaker verification systems that input raw waveforms.

clovaai/voxceleb_trainer — This repository contains the framework for training speaker recognition models described in the paper ‘In defence of metric learning for speaker recognition’ and ‘Pushing the limits of raw waveform speaker recognition’.

CoinCheung/pytorch-loss — label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

Video ReCap: Recursive Captioning of Hour-Long Videos, code

ar4/deepwave — Wave propagation modules for PyTorch.

google-deepmind/dm_control — Google DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers