Interesting links, 16/03/2024

TMUX — Sharing terminal between Users

tmux -S /tmp/socket
chmod 777 /tmp/socket
# other user does:
tmux -S /tmp/socket attach

Bosco visits the lions at Dublin zoo

DepthFM: Fast Monocular Depth Estimation with Flow Matching

SakanaAI/evolutionary-model-merge

I wrote this Format dialog back on a rainy Thursday morning at Microsoft in late 1994, I think it was.

We were porting the bajillion lines of code from the Windows95 user interface over to NT, and Format was just one of those areas where WindowsNT was different enough from… pic.twitter.com/PbrhQe0n3K
— Dave W Plummer (@davepl1968) March 24, 2024

There was a 'Forbidden' error fetching URL: 'https://twitter.com/Foone/status/1302820468819288066'

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

yl4579/PL-BERT — Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

MagyarorszagON, BudapestEN, SzlovakiaBAN, PozsonyBAN

Why Tool’s Danny Carey Is Your Drummer’s Favorite Drummer

A Corpus-based Pronunciation Teaching Model: A Conceptual Paper

Nigerian English pronunciation preferences: A corpus-based survey of pronunciation variants

Survey of American pronunciation preferences – a preliminary report

A corpus-based study of English pronunciation variations

Voice Conversion With Just Nearest Neighbors, code, demo (Go to “Dog-person conversion”!)

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging, code

L9: Cepstral analysis

Pre-emphasis

Estimating the Airspeed Velocity of an Unladen Swallow

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts, code

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

abuccts/wikt2pron — Wikt2pron is a Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format.

95% of what we are doing in AI is stuff that's simple enough to explain to a child, made way over-complicated by mathematical-looking notation and unclear thinking. A short rant 🧵 using attention as an example.
— Alex Clemmer 🔥🔥🔥😅🔥🔥🔥 (@hausdorff_space) April 12, 2024

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

Can You Really Run on Top of a Train, Like in the Movies?

Parler TTS, huggingface/parler-tts, huggingface/dataspeech, parler-tts/parler_tts_mini_v0.1

EgoPet: Egomotion and Interaction Data from an Animal’s Perspective

The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation

The London–Lund corpus of spoken English: Description and research

loeweX/Forward-Forward — Reimplementation of Geoffrey Hinton’s Forward-Forward Algorithm

alphacep/whisper-prompts

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Phoneme Recognition with Wav2Vec2

Generating Diverse and Natural 3D Human Motions from Text, code

gaELECTRA

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, code

Swedish Kelly-list

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

jbeskow/tuben — Tube model of vocal tract - resonance frequency estimation

A new neural network just dropped from @MIT aka Kolmogorov-Arnold Network (KAN).

Let's try to break it down!

Note: Some technical descriptions are simplified and rephrased for clarity:
🧵 👇👇 pic.twitter.com/NKReAnQE1o
— Isaak Kamau🧑‍💻 (@aidev_isaak) May 1, 2024

KAN: Kolmogorov-Arnold Networks, code

Last week we released 🤗Diarizers, a library for fine-tuning speaker diarization models 🗣️

Using a free Google Colab, it takes 10 minutes to improve multilingual performance by 30%: https://t.co/J71bpyB3FV

A major step towards democratising speaker diarization! 🌎 pic.twitter.com/MziWo9my5Y
— Sanchit Gandhi (@sanchitgandhi99) May 1, 2024

New paper surveying multimodal LLM hallucinations: https://t.co/9PY0dllY2L

It creates a taxonomy of the varied ways hallucinations appear, with an intent to reveal causes and explain mitigation strategies.

It's an educational read and an admirable effort. Hallucinations,… pic.twitter.com/cb3u3U5gGd
— DJ (@DuaneJRich) April 30, 2024

Hallucination of Multimodal Large Language Models: A Survey

microsoft/MS-DOS

Global Normalization for Streaming Speech Recognition in a Modular Framework

google-research/skai — SKAI is a machine learning based tool for performing automatic building damage assessments on aerial imagery of disaster sites.

bigscience-workshop/petals — Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Can deep learning work on small data with far more features than samples? We present PLATO: a method that achieves the state-of-the-art on such datasets by using prior domain information! https://t.co/GdsqIHRL8a 🧵

Published in #NeurIPS2023 with @ren_hongyu @kexinhuang5 @jure pic.twitter.com/rsRya15gjO
— Camilo Ruiz (@_camiloruiz) December 13, 2023

High dimensional, tabular deep learning with an auxiliary knowledge graph

An Embodied Generalist Agent in 3D World, code

LERF: Language Embedded Radiance Fields, code

facebookresearch/Mask2Former

The details of OWSM’s budget was featured in @shinjiw_at_cmu’s keynote at @ASRU2023 .

“It would have been a disaster if the paper was rejected” pic.twitter.com/R5SK35JY77
— William Chen (@chenwanch1) December 18, 2023

Jungjee/RawNet — This repository includes implementations of speaker verification systems that input raw waveforms.

clovaai/voxceleb_trainer — This repository contains the framework for training speaker recognition models described in the paper ‘In defence of metric learning for speaker recognition’ and ‘Pushing the limits of raw waveform speaker recognition’.

CoinCheung/pytorch-loss — label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

Video ReCap: Recursive Captioning of Hour-Long Videos, code

ar4/deepwave — Wave propagation modules for PyTorch.

google-deepmind/dm_control — Google DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

MASSIVE idea proposed in this paper.

Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs) for approximating nonlinear functions 🤯

📌 Unlike MLPs which have fixed activation functions on nodes, KANs have learnable activation functions… pic.twitter.com/7hwn2gF6zT
— Rohan Paul (@rohanpaul_ai) May 1, 2024

Let's have a look at this, I am reminded so often that most people don't necessarily understand what they are looking at with this industry, so it might be quite interesting to dissect.

We call this a Bridge Strike, when you understand a bit more you can see why they happen 🧵 https://t.co/WJk0xB7mMx
— Tom (@thelorryist) May 4, 2024

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers