Interesting links, 16/03/2024
Misc. interesting things.
TMUX — Sharing terminal between Users
tmux -S /tmp/socket
chmod 777 /tmp/socket
# other user does:
tmux -S /tmp/socket attach
Bosco visits the lions at Dublin zoo
DepthFM: Fast Monocular Depth Estimation with Flow Matching
SakanaAI/evolutionary-model-merge
I wrote this Format dialog back on a rainy Thursday morning at Microsoft in late 1994, I think it was.
— Dave W Plummer (@davepl1968) March 24, 2024
We were porting the bajillion lines of code from the Windows95 user interface over to NT, and Format was just one of those areas where WindowsNT was different enough from… pic.twitter.com/PbrhQe0n3K
There was a 'Forbidden' error fetching URL: 'https://twitter.com/Foone/status/1302820468819288066'
yl4579/PL-BERT — Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MagyarorszagON, BudapestEN, SzlovakiaBAN, PozsonyBAN
Why Tool’s Danny Carey Is Your Drummer’s Favorite Drummer
A Corpus-based Pronunciation Teaching Model: A Conceptual Paper
Nigerian English pronunciation preferences: A corpus-based survey of pronunciation variants
Survey of American pronunciation preferences – a preliminary report
A corpus-based study of English pronunciation variations
Voice Conversion With Just Nearest Neighbors, code, demo (Go to “Dog-person conversion”!)
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging, code
Estimating the Airspeed Velocity of an Unladen Swallow
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts, code
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
abuccts/wikt2pron — Wikt2pron is a Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format.
95% of what we are doing in AI is stuff that's simple enough to explain to a child, made way over-complicated by mathematical-looking notation and unclear thinking. A short rant 🧵 using attention as an example.
— Alex Clemmer 🔥🔥🔥😅🔥🔥🔥 (@hausdorff_space) April 12, 2024
Can You Really Run on Top of a Train, Like in the Movies?
Parler TTS, huggingface/parler-tts, huggingface/dataspeech, parler-tts/parler_tts_mini_v0.1
EgoPet: Egomotion and Interaction Data from an Animal’s Perspective
The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation
The London–Lund corpus of spoken English: Description and research
loeweX/Forward-Forward — Reimplementation of Geoffrey Hinton’s Forward-Forward Algorithm
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Phoneme Recognition with Wav2Vec2
Generating Diverse and Natural 3D Human Motions from Text, code
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, code
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
jbeskow/tuben — Tube model of vocal tract - resonance frequency estimation
A new neural network just dropped from @MIT aka Kolmogorov-Arnold Network (KAN).
— Isaak Kamau🧑💻 (@aidev_isaak) May 1, 2024
Let's try to break it down!
Note: Some technical descriptions are simplified and rephrased for clarity:
🧵 👇👇 pic.twitter.com/NKReAnQE1o
KAN: Kolmogorov-Arnold Networks, code
Last week we released 🤗Diarizers, a library for fine-tuning speaker diarization models 🗣️
— Sanchit Gandhi (@sanchitgandhi99) May 1, 2024
Using a free Google Colab, it takes 10 minutes to improve multilingual performance by 30%: https://t.co/J71bpyB3FV
A major step towards democratising speaker diarization! 🌎 pic.twitter.com/MziWo9my5Y
New paper surveying multimodal LLM hallucinations: https://t.co/9PY0dllY2L
— DJ (@DuaneJRich) April 30, 2024
It creates a taxonomy of the varied ways hallucinations appear, with an intent to reveal causes and explain mitigation strategies.
It's an educational read and an admirable effort. Hallucinations,… pic.twitter.com/cb3u3U5gGd
Hallucination of Multimodal Large Language Models: A Survey
Global Normalization for Streaming Speech Recognition in a Modular Framework
google-research/skai — SKAI is a machine learning based tool for performing automatic building damage assessments on aerial imagery of disaster sites.
bigscience-workshop/petals — Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Can deep learning work on small data with far more features than samples? We present PLATO: a method that achieves the state-of-the-art on such datasets by using prior domain information! https://t.co/GdsqIHRL8a 🧵
— Camilo Ruiz (@_camiloruiz) December 13, 2023
Published in #NeurIPS2023 with @ren_hongyu @kexinhuang5 @jure pic.twitter.com/rsRya15gjO
High dimensional, tabular deep learning with an auxiliary knowledge graph
An Embodied Generalist Agent in 3D World, code
LERF: Language Embedded Radiance Fields, code
The details of OWSM’s budget was featured in @shinjiw_at_cmu’s keynote at @ASRU2023 .
— William Chen (@chenwanch1) December 18, 2023
“It would have been a disaster if the paper was rejected” pic.twitter.com/R5SK35JY77
Jungjee/RawNet — This repository includes implementations of speaker verification systems that input raw waveforms.
clovaai/voxceleb_trainer — This repository contains the framework for training speaker recognition models described in the paper ‘In defence of metric learning for speaker recognition’ and ‘Pushing the limits of raw waveform speaker recognition’.
CoinCheung/pytorch-loss — label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Video ReCap: Recursive Captioning of Hour-Long Videos, code
ar4/deepwave — Wave propagation modules for PyTorch.
google-deepmind/dm_control — Google DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
MASSIVE idea proposed in this paper.
— Rohan Paul (@rohanpaul_ai) May 1, 2024
Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs) for approximating nonlinear functions 🤯
📌 Unlike MLPs which have fixed activation functions on nodes, KANs have learnable activation functions… pic.twitter.com/7hwn2gF6zT
Let's have a look at this, I am reminded so often that most people don't necessarily understand what they are looking at with this industry, so it might be quite interesting to dissect.
— Tom (@thelorryist) May 4, 2024
We call this a Bridge Strike, when you understand a bit more you can see why they happen 🧵 https://t.co/WJk0xB7mMx
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers