Interesting links, 27/12/2023
Misc. interesting things.
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
Learning Sparse Prototypes for Text Generation
siddk/voltron-robotics — Voltron: Language-Driven Representation Learning for Robotics
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, state-spaces/mamba
johnma2006/mamba-minimal — Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
nnnoiseless: porting audio code from C to rust, code
ZDisket/TensorVox — Desktop application for neural speech synthesis written in C++
avaneev/r8brain-free-src — High-quality pro audio resampler / sample rate converter C++ library. Very fast, for both audio resampling and time-series interpolation.
castorini/howl — Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.
stanford-oval/genie-toolkit — The Genie open source kit for voice assistant (formerly known as Almond)
stanford-oval/thingtalk — The Programming Language of Virtual Assistants
salesforce/morpheus — Code for ACL’20 paper “It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations”
Instructions for estimating the location of beats in a soundfile
LSTMs Explained: A Complete, Technically Accurate, Conceptual Guide with Keras
In his spare time, an engineer found flaws in the classic book “A Million Random Digits”
QData/TextAttack — About TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP
Hear Slayer guitarist Jeff Hanneman’s ferocious unreleased demos for Reign In Blood
gtn-org/gtn — Automatic differentiation with weighted finite-state transducers.
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Latent linguistic embedding for cross-lingual text-to-speech and voice conversion
Domain Adversarial Neural Networks for Dysarthric Speech Recognition
facebookincubator/CG-SQL — CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.
Knowledge Transfer in Self Supervised Learning
Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition
google/monster-mash — Sketch-Based Modeling and Animation Tool
Transformer-based Encoder-Decoder Models
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Overview: State-of-the-Art Machine Learning Algorithms per Discipline & per Task
Composition-based on-the-fly rescoring for salient n-gram biasing
Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
getkeops/keops — KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
A Comprehensive Overview of Gaussian Splatting
stanfordnlp/dspy — Stanford DSPy: The framework for programming with foundation models
stanford-futuredata/ColBERT — ColBERT: state-of-the-art neural search (SIGIR’20, TACL’21, NeurIPS’21, NAACL’22, CIKM’22)
allenai/Holodeck — Language Guided Generation of 3D Embodied AI Environments.
Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation
MalcolmSlaney/python_auditory_toolbox
Protecting Voice-Controlled Devices against LASER Injection Attacks
Accelerating over 130,000 Hugging Face models with ONNX Runtime
The N Implementation Details of RLHF with PPO
8 Hungarian Novels You Should Read Before You Die
Deploy Embedding Models with Hugging Face Inference Endpoints
Personal Copilot: Train Your Own Coding Assistant
Language Model Beats Diffusion – Tokenizer is Key to Visual Generation
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, code
marella/ctransformers — Python bindings for the Transformer models implemented in C/C++ using GGML library.
chronhib-MU/Chronhib-Website — This is the ChronHib website repository.
Plachtaa/VALL-E-X — An open source implementation of Microsoft’s VALL-E X zero-shot TTS model.
suno-ai/bark — Text-Prompted Generative Audio Model
The Project Gutenberg Open Audiobook Collection, code
Ressources for End-to-End French Text-to-Speech Blizzard challenge
Speaker-independent Speech Inversion for Estimation of Nasalance, code
Implementing Contextual Biasing in GPU Decoder for Online ASR, idiap/contextual-biasing-on-gpus
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
BAT: Boundary aware transducer for memory-efficient and low-latency ASR
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
idiap/bob — Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland.
A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers
Vowel reduction by Greek-speaking children: The effect of stress and word length
NeMo Forced Aligner and its application to word alignment for subtitle generation
A stimulus-organism-response model of willingness to buy from advertising speech using voice quality
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think
Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR
Cross-lingual Prosody Transfer for Expressive Machine Dubbing
An Analysis of Goodness of Pronunciation for Child Speech
Prefix Search Decoding for RNN Transducers
ddlBoJack/MT4SSL — Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
amogh3892/Audio-classification-using-Bag-of-Frames-approach — Classification of different categories of audio clips, especially non speech sounds using Bag-of-Frames approach.
JournalismAI-2021-Quotes/quote-extraction — Quote extraction for modular journalism (JournalismAI collab 2021)
Nearest Neighbor Machine Translation
Clarifying exceptions and visualizing tensor operations in deep learning code
Translation Artifacts in Cross-lingual Transfer Learning
Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge
meyda/meyda — Audio feature extraction for JavaScript.
adrianbg/kaldi.js — This is a version of Kaldi tweaked to build to WebAssembly.
LSTMs Compose (and Learn) Bottom-Up
Modern Practical Natural Language Processing
Understanding Transformers, the Programming Way
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search, code
Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
antonisa/unimorph_inflect — A python library for easily querying morphological inflection models trained on Unimorph
vivjay30/Cone-of-Silence — Speech Separation by Localization
kermitt2/grobid — A machine learning software for extracting information from scholarly documents
microsoft/nni — An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
microsoft/LabanotationSuite — Microsoft Applied Robotics Research Library: LabanotationSuite - open source software tools to give service robots the ability to perform human-like gestures
TezRomacH/layer-to-layer-pytorch
Cross-lingual Retrieval for Iterative Self-Supervised Training
Augmenting Transformers with KNN-Based Composite Memory for Dialogue
REALM: Retrieval-Augmented Language Model Pre-Training, code
Rethinking Attention with Performers
CS231n: Convolutional Neural Networks for Visual Recognition
Over 200 of the Best Machine Learning, NLP, and Python Tutorials — 2018 Edition
KinWaiCheuk/nnAudio — Audio processing by using pytorch 1D convolution network
Bootstrapping Relation Extractors using Syntactic Search by Examples
The Fairy Tales of the Brothers Grimm
An Introduction to Hungarian Literature in 8 books
mermaid-js/mermaid — Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown
The JavaScript library for bespoke data visualization
opal/opal — Opal is a Ruby to JavaScript source-to-source compiler.
linebender/druid — A data-first Rust-native UI design toolkit.
alphacep/vosk-api — Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
ossrs/srs — SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
Dobiasd/frugally-deep — Header-only library for using Keras (TensorFlow) models in C++.
HazyResearch/bootleg — Self-Supervision for Named Entity Disambiguation at the Tail
Composition-based on-the-fly rescoring for salient n-gram biasing
Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints
google-research-datasets/RxR — Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual per…
automerge/automerge — A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
alexa/visitron — VISITRON: A multi-modal Transformer-based model for Cooperative Vision-and-Dialog Navigation (CVDN)
Deep Transformers with Latent Depth
Recreating Historical Streetscapes Using Deep Learning and Crowdsourcing
Deep Transformers with Latent Depth
alexa/ramen — A software for transferring pre-trained English models to foreign languages
alexa/Topical-Chat — A dataset containing human-human knowledge-grounded open-domain conversations.
Causal Reasoning in Probability Trees
cdk8s-team/cdk8s — Define Kubernetes native apps and abstractions using object-oriented programming
ali-vilab/videocomposer — Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
microsoft/MS-SNSD — The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
MarvinLvn/BabySLM — Behavioral probing of language acquisition models at the lexical and syntactic level
A Complete Logistic Regression Algorithm From Scratch in Python: Step by Step
microsoft/hummingbird — Hummingbird compiles trained ML models into tensor computation for faster inference.
Noisy speech database for training speech enhancement algorithms and TTS models
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition
Converting Jupyter Notebooks into blog posts with Gatsby
Interactive spreadsheets in Jupyter
Supervised Pretraining Can Learn In-Context Reinforcement Learning
robodhruv/visualnav-transformer — Official code and checkpoint release for “ViNT: A Foundation Model for Visual Navigation”.
mfaruqui/retrofitting — Retrofitting Word Vectors to Semantic Lexicons
jart/sectorlisp — Bootstrapping LISP in a Boot Sector
blink1073/oct2py — Run M Files from Python - GNU Octave to Python bridge
scoder/lupa — Lua in Python
deepinsight/insightface — State-of-the-art 2D and 3D Face Analysis Project
The importance of fillers for text representations of speech transcripts
Learning Robust and Multilingual Speech Representations
End-to-End Speech Recognition and Disfluency Removal
The role of context in neural pitch accent detection in English
Reconstructing the brain of fruit flies
Sharing Project Amber with the mental health community
mfaruqui/morph-trans — Code for morphological transformations
higgood/incremental-word2vec — Modify word2vec such that it’s possible to “condition” on existing embeddings for some words, and induce embeddings for new words.
Supervised Pretraining Can Learn In-Context Reinforcement Learning
The Power of Scale for Parameter-Efficient Prompt Tuning
shashikg/WhisperS2T — An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Enginer
‘Less Than One’-Shot Learning: Learning N Classes From M<N Samples
Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV
Speaker Embedding Extraction with Phonetic Information
On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks
TensorSpeech/TensorflowTTS — 😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Language Model is All You Need: Natural Language Understanding as Question Answering
Building RNNs is Fun with PyTorch and Google Colab
20 free Irish language audiobooks for children
Generalized End-to-End Loss for Speaker Verification
Layout-Parser/layout-parser — A Unified Toolkit for Deep Learning Based Document Image Analysis
OCR for Endangered Language Texts, code
Google Cardboard open sourced as active development on Google VR SDK stops
KomputeProject/kompute — General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
m3hrdadfi/wiki-summary — A Bert2Bert model which able to summarize articles!
CAMeL-Lab/camel_tools — A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Traditional Versus ASR-Based Pronunciation Instruction, An Empirical Study
ruffle-rs/ruffle — A Flash Player emulator written in Rust
Learning Sparse Prototypes for Text Generation
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia
ANNOYingly Simple Sentence Clustering
How JavaScript Libraries Are Training Neural Networks on Web Browsers
andrenatal/phonetisaurus-emscripten
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, code
Finding Syntax with Structural Probes
adefossez/julius — Fast PyTorch based DSP for audio and 1D signals
M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning
evcxr/evcxr — An evaluation context for Rust.
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
Pronunciation Variation Modeling for Dutch Automatic Speech Recognition
Score-Based Generative Modeling through Stochastic Differential Equations
MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
Hero-Tales of Ireland by Jeremiah Curtin
MycroftAI/lingua-franca — Mycroft’s multilingual text parsing and formatting library
Acoustic event recognition using cochleagram image and convolutional neural networks
CBMM/cochleagram — Cochlear sound spectrum
Speaker-independent vowel recognition: spectrograms versus cochleagrams
A joint training framework for robust automatic speech recognition
Auditory features based on Gammatone filters for robust speech recognition
NN-512 — NN-512 is a compiler that generates C99 code for neural net inference
vakila/de-stress — Prototype German Computer-Assisted Pronunciation Training tool for lexical stress errors
guanpengchn/awesome-pronunciation
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence
10 Ways to Optimize Text for Machine Translation
Learning from Language Explanations
Computing Receptive Fields of Convolutional Neural Networks
xinjli/allosaurus — Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Feature Learning in Infinite-Width Neural Networks
Uncertainty Estimation in Autoregressive Structured Prediction
RNNs can generate bounded hierarchical languages with optimal memory
persephone-tools/persephone — A tool for automatic phoneme transcription
dmort27/allovera — A phoneme-allophone database for many languages
Deploying Part-of-Speech Patterns to Enhance Statistical Phrase-Based Machine Translation Resources
A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition
The Scientist and Engineer’s Guide to Digital Signal Processing
When regular is not easy: Cracking the code of Irish orthography
giakou4/pyfeats — Open source software for image feature extraction.
Affordances from Human Videos as a Versatile Representation for Robotics
ReadAlongs/Studio — Audiobook alignment for Indigenous languages
ReadAlongs/Web-Component — Suite of web packages for creating interactive ReadAlongs
roedoejet/convertextract — Extract and find/replace text based on arbitrary correspondences while preserving original file formatting. This library is a fork from the Textract library by Dean Malmgren.
markovka17/dla — Deep learning for audio processing
facebookresearch/CPC_audio — An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading
iamjanvijay/rnnt_decoder_cuda — An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
awni/transducer — A Fast Sequence Transducer Implementation with PyTorch Bindings
iceychris/LibreASR — An On-Premises, Streaming Speech Recognition System
How to convert a pre-trained model for Kaldi to Vosk
MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device
English Dialects From the Eighth Century to the Present Day by Walter W. Skeat
Ireland, Historic and Picturesque by Charles Johnston
What’s the Matter with Ireland? by Ruth Russell
A Visit From Saint Nicholas by Clement Clarke Moore
The Most Ancient Lives of Saint Patrick by James O’Leary
Anglo-Saxon Literature by John Earle
The Reminiscences of an Irish Land Agent by Samuel Murray Hussey
Building Custom Deep Learning Based Optical Character Recognition (OCR) models
emedvedev/attention-ocr — A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
Retentive Network: A Successor to Transformer for Large Language Models
courao/ocr.pytorch — A pure pytorch implemented ocr project including text detection and recognition
Xilinx/pytorch-ocr — Quantized LSTMs for OCR
DTolm/VkFFT — Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
DTolm/VkResample — Vulkan real-time FFT upscaling
Historical Copyright Records and Transparency
Speech-Lab-IITM/English_ASR_Challenge — English ASR Challenge organized by Speech Lab, IIT Madras
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Example for Clustered Transformers
Speech Recognition with Python
TAPAS base model fine-tuned on WikiTable Questions
What is Similarity Between Sentences?
catalyst-team/dl-course — Deep Learning with Catalyst
facebookresearch/ClassyVision — An end-to-end PyTorch framework for image and video classification
Mel Frequency Cepstral Coefficient (MFCC) tutorial
Description d’un parler irlandais de Kerry/Texte
Audio samples of Ulster-Scots speakers
Ulster-Scots Education Resources
An focal don ainmhí seo → 🐶 i nGaeilge
Character Recognition and Segmentation For Custom Data Using Detectron2
da03/Attention-OCR — Visual Attention based OCR
Recent Advances in Google Translate
Narrative framing of consumer sentiment in online restaurant reviews
Training optical character recognition technology Tesseract on a new character font on MacOS
Fine-tuning Tesseract OCR for German Invoices
Training Tesseract on your custom dataset using Qt Box Editor
Add four additional special unicode characters to tesseract
zdenop/qt-box-editor — QT4 editor of tesseract-ocr box files
IfcOpenShell — The open source IFC toolkit and geometry engine
IFCjs/web-ifc-viewer — Graphics engine and toolkit for client applications.
Self-training and pre-training, understanding the wav2vec series
clovaai/deep-text-recognition-benchmark — Text recognition (optical character recognition) with deep learning methods.
apple/ml-equivariant-neural-rendering — This repo contains code to reproduce all experiments in Equivariant Neural Rendering by E. Dupont, M. A. Bautista, A. Colburn, A. Sankar, C. Guestrin, J. Susskind, Q. Shan, ICML 2020.
PrefectHQ/prefect — Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
kaituoxu/Conv-TasNet — A PyTorch implementation of Conv-TasNet described in “TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation” with Permutation Invariant Training (PIT).
Månadens profil: Jim O´Regan - Språkbanken
The historical short vowel phonology of Gaelic
The Structure of the Consonant System of the Gaelic of Torr, Co. Donegal
Collins gem Irish dictionary : English-Irish, Irish-English
Visual Speech Enhancement Without A Real Visual Stream, code
joonson/syncnet_python — Out of time: automated lip sync in the wild
High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder
The Grammar of English Grammars
karthiTox/deepnet.js — Auto-differentiation library for javascript
Familiar feud in Poland after game show calls regional language a dialect
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Google’s REALM — A Knowledge-base Augmented Language Model
apple/ml-mkqa — We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper f…
From Historical Sources to Datasets: A Preview of DataScribe, code
Reading and Writing RDF in Apache Jena
kba/jsonld-rapper — Create RDF from JSON-LD with rapper
Cad iad na focail Ghaeilge is mó a mbíonn deacracht ag daoine atá líofa sa teanga iad a litriú?
— Eoin P. Ó Murchú 🇵🇸 (@murchadhmor) January 6, 2021
Study: Folklore structure reveals how conspiracy theories emerge, fall apart
Word-level text generation with Keras in <50 lines of code
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Continuous Active Learning Using Pretrained Transformers
Cainteoirí Dúchais a éisteacht
stanfordnlp/string2string — String-to-String Algorithms for Natural Language Processing
REVIEW OF 1984 By Isaac Asimov
MLCommons People’s Speech Dataset
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Russian Text Normalization for STT and TTS
k-Nearest Neighbor Language Models
thu-spmi/CAT — A CRF-based ASR Toolkit
Linformer: Self-Attention with Linear Complexity
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
awslabs/sockeye — Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
Cross-lingual Retrieval for Iterative Self-Supervised Training
How to publish a txt corpora with NIF as Linked Data
openai/CLIP — CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Recognizing Pose Similarity in Images and Videos
opensheetmusicdisplay/opensheetmusicdisplay — OpenSheetMusicDisplay renders sheet music in MusicXML format in your web browser based on VexFlow. OSMD is brought to you by PhonicScore.com.
Spectrograms and speech processing
Why You Should Do NLP Beyond English
DingXiaoH/RepVGG — RepVGG: Making VGG-style ConvNets Great Again
kwrobel-nlp/kftt — Polish morphosyntactic tagger.
Foghraidheacht Ghaedhilge an Tuaiscirt
izuzak/noam — JavaScript library for working with automata and grammars for regular and context-free languages
google/refr — A framework for building reranking models.
usc-sail/barista — Barista is an open-source framework for concurrent speech processing.
Pronouns and Definite vs Indefinite Conjugation
labmlai/annotated_deep_learning_paper_implementations
k-Nearest Neighbor Language Models
Evaluate k-nearest neighbor language model
Denoising Diffusion Probabilistic Models (DDPM)
CS224N: Natural Language Processing with Deep Learning
OpenNLPLab/cosFormer — [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
web-arena-x/webarena — Code repo for “WebArena: A Realistic Web Environment for Building Autonomous Agents”
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition, code
Guidance: a cheat code for diffusion models
How to adapt a multilingual T5 model for a single language
salesforce/LAVIS — LAVIS - A One-stop Library for Language-Vision Intelligence
kscanne/gbb — Sonraí traenála/tástála NLP
ML Olympiad - Multilingual Spell Correction
Fine-tuning the multilingual T5 model from Huggingface with Keras
How to adapt a multilingual T5 model for a single language
Zjh-819/LLMDataHub — A quick guide (especially) for trending instruction finetuning datasets
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer, code
Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model
Hannibal046/Awesome-LLM — Awesome-LLM: a curated list of Large Language Model
cvg/LightGlue — LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
kenjihiranabe/The-Art-of-Linear-Algebra — Graphic notes on Gilbert Strang’s “Linear Algebra for Everyone”
Forced Alignment with Wav2Vec2
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
google-research/tensor2robot — Distributed machine learning infrastructure for large-scale robotics research
google-research/pix2seq — Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
google-research/language-table — Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.
Unit 3. Transformer architectures for audio
HomeRobot: Open Vocabulary Mobile Manipulation, code
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
google-deepmind/dm_robotics — Libraries, tools and tasks created and used at DeepMind Robotics.
facebookresearch/LaViLa — Code release for “Learning Video Representations from Large Language Models”
facebookresearch/paco — This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts, and visualization notebooks.
lyuchenyang/Macaw-LLM — Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
YuanGongND/cav-mae — Code and Pretrained Models for ICLR 2023 Paper “Contrastive Audio-Visual Masked Autoencoder”.
Tracking Everything Everywhere All at Once, code
facebookresearch/audiocraft — Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Whar city every country is most ashamed of in europe
mkuchnik/relm — ReLM is a Regular Expression engine for Language Models
A Fast Algorithm for Computing Prefix Probabilities
Implementation of the Branchformer
HyperMixer: An MLP-based Low Cost Alternative to Transformers
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Brainformers: Trading Simplicity for Efficiency
unilight/seq2seq-vc — A sequence-to-sequence voice conversion toolkit.
shivangi-aneja/COSMOS — [AAAI 2023] COSMOS: Catching Out-of-Context Misinformation using Self Supervised Learning
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement
There was a 'Not Found' error fetching URL: 'https://twitter.com/i/web/status/1661714548594823174'
Announcing AI2 OLMo, an Open Language Model Made by Scientists, for Scientists
Byte Pair Encoding is Suboptimal for Language Model Pretraining
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
MuJoCo, code — Multi-Joint dynamics with Contact. A general purpose physics simulator.
google-research/robopianist — [CoRL ‘23] Dexterous piano playing with deep reinforcement learning.
Perlence/PyGuitarPro — Read, write and manipulate GP3, GP4 and GP5 files.
Towards Healthy AI: Large Language Models Need Therapists Too
psst-challenge/psstbaseline — Baseline models for the Post-Stroke Speech Transcription (PSST) challengt
viktor-enzell/wav2vec2-large-voxrex-swedish-4gram
Flamingo: a Visual Language Model for Few-Shot Learning
UL2 20B: An Open Source Unified Language Learner, code
Kaldi ASR: Extending the ASpIRE model
openai/CLIP — CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Multimodal Chain-of-Thought Reasoning in Language Models, code
lllyasviel/ControlNet — Let us control diffusion models!
FelixOpolka/Single-Player-MCTS — Python implementation of single-player Monte-Carlo Tree Search.
google-deepmind/mctx — Monte Carlo tree search in JAX
Speech Synthesis, Recognition, and More With SpeechT5
Teaching OPT to Paraphrase through Soft Prompt Tuning
Use transfer learning for ASR in ESPnet2
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
v-iashin/SpecVQGAN — Source code for “Taming Visually Guided Sound Generation” (Oral at the BMVC 2021)
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
OverFlow: Putting flows on top of neural transducers for better TTS
dair-ai/Mathematics-for-ML — A collection of resources to learn mathematics for machine learning
dair-ai/ML-Notebooks — Machine Learning Notebooks
SentenceBERT — Semantically meaningful sentence embeddings the right way
krrish94/nerf-pytorch — A PyTorch re-implementation of Neural Radiance Fields
nv-tlabs/nglod — Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)
NVIDIAGameWorks/PhysX — NVIDIA PhysX SDK
NVIDIA-Omniverse/IsaacGymEnvs — Isaac Gym Reinforcement Learning Environments
NVIDIAGameWorks/kaolin — A PyTorch Library for Accelerating 3D Deep Learning Research
Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer, code
Denys88/rl_games — RL implementations
sail-sg/envpool — C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
Evidence of a predictive coding hierarchy in the human brain listening to speech
A Distributed Systems Reading List
Lexicon Model for Ontologies: Community Report, 10 May 2016
Wapiti - A simple and fast discriminative sequence labelling toolkit, code
Towards Augmenting Lexical Resources for Slang and African American English
Notebook to run Ruby on Google Colaboratory
Finding the Words to Say: Hidden State Visualizations for Language Models
MixConv: Mixed Depthwise Convolutional Kernels
Extracting Features from an Intermediate Layer of a Pretrained ResNet Model in PyTorch
PatchBERT: Just-in-Time, Out-of-Vocabulary Patching
Fine-tuning Mozilla DeepSpeech for the Indian Accent
Indian Accent Speech Recognition
trainc — TrainC builds compact context dependency transducers for WFST-based speech recognition from acoustic training data.
A curated list of speech and natural language processing resources
benob/openlat — Toolkit for manipulating word lattices built on top of openfst
usc-sail/barista — Barista is an open-source framework for concurrent speech processing.
amir-zeldes/gum — Repository for the Georgetown University Multilayer Corpus (GUM)
nassosoassos/sail_align — SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors.
Darby O’Gill and the Good People
The Sleeping beauty of the wood
LEAF: A Learnable Frontend for Audio Classification
Spectrogram & Oscillator, code
lucidrains/axial-positional-embedding — Axial Positional Embedding for Pytorch
CPJKU/madmom — Python audio and music signal processing library
matthew-brett/transforms3d — 3 dimensional spatial transformations
Moof-A-Day: Early Macintosh Software
arogozhnikov/einops — Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Introducing Lamini, the LLM Platform for Rapidly Customizing Models
The Illustrated Stable Diffusion
Naoi ngábhadh an Ghiolla Dhuibh.
lucidrains/PaLM-rlhf-pytorch — Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
How Virtual Reality Can Help Those With Autism
Giskard is coming to your notebook: Python meets Java via gRPC tunnel
Illustrating Reinforcement Learning from Human Feedback
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
microsoft/BlingFire — A lightning fast Finite State machine and REgular expression manipulation library.
rom1504/cc2dataset — Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text,
facebookresearch/barlowtwins — PyTorch implementation of Barlow Twins.
facebookresearch/vicreg — VICReg official code base
sileod/tasknet — Easy multi-task learning with HuggingFace Datasets and Trainer
Vol. 16, 1952, Contributions in Memory of Osborn Bergin
orktes/go-torch — LibTorch (PyTorch) bindings for Golang
The Gaelic dialect of Urris, Inishowen, Co. Donegal
karpathy/deep-vector-quantization — VQVAEs, GumbelSoftmaxes and friends
Animating Stereograms with Optical Flow Morphing
Transformers in Pytorch from scratch for NLP Beginners
PyTorch for TensorFlow Users - A Minimal Diff
Brain, Time, CTC blank states and streaming
Testing Facebook MMS and SeamlessMT4 Word Error Rate
N-gram language model toolkits in 2020
jermp/tongrams — A C++ library providing fast language model queries in compressed space.
On latency of speech recognition
Wav2vec 2.0: Learning the structure of speech from raw audio
Generate distance matrix from features
Calamari-OCR/calamari — Line based ATR Engine based on OCRopy
kraken, mittagessen/kraken — OCR engine for all the languages
not-implemented/hocr-proofreader — Web based JavaScript GUI library for proofreading/editing hOCR
GeReV/hocr-editor-ts — A visual hOCR file editor
Introduction to Simple Neural Networks
Python Concurrency: The Tricky Bits
hocr-tools, CUSAT/hocr-tools — Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
mbartoli/tAlign — Text alignment for OCR using FFTs
DDMAL/text_alignment — Aligns correct transcripts to text images using a “messy” OCR and Needleman-Wunsch sequence alignment
Early-Modern-OCR/RETAS — Part of eMOP: the Recursive Text Alignment Tool compares OCR text results to groundtruth by character and computes a score.
cisocrgroup/ocrd_cis — OCR-D python tools
ofirpress/shortformer — Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Gaeilge Laighean by Colm Ó Broin
TomHarte/dsk2woz — A command-line tool to convert Apple II DSK images to WOZ format.
bzotto/picturedsk — Imprint an “image” in the magnetic flux of an Apple 5.25” floppy disk
Learn About Transformers: A Recipe
Finding blocks of text in an image using Python, OpenCV and numpy
ocropus-archive/DUP-ocropy — Python-based tools for document analysis and OCR
NVIDIA/speechsquad — Conversational AI Benchmark.
Urlabhraidheacht agus graimear na gaedhilge, cuid I.
digiah/oldOCR — Optical Character Recognition of old and noisy print sources
Neural Inverse Text Normalization
JFLAP — JFLAP is software for experimenting with formal languages topics including nondeterministic finite automata, nondeterministic pushdown automata, multi-tape Turing machines, several types of grammars, parsing, and L-systems.
Chris Lattner: Revolutionizing the C++ World
Irish folklore archive inscribed into UNESCO register
GraphiteEditor/Graphite — 2D raster & vector editor that melds traditional layers & tools with a modern node-based, fully non-destructive procedural workflow.
apple/turicreate — Turi Create simplifies the development of custom machine learning models.
Comparing signals in the time domain
google-research/sofima — Scalable Optical Flow-based Image Montaging and Alignment
google-research/lingvo-lab — Demos, samples, and experimental code for Lingvo.
google-research/last — A JAX library for building lattice-based speech transducer models
ZipIt! Merging Models from Different Tasks without Training
hyunwoongko/kochat — Opensource Korean chatbot framework
Lyra: A New Very Low-Bitrate Codec for Speech Compression
neulab/nn4nlp-concepts — A repository of concepts related to neural networks for NLP
neubig/nn4nlp-code — Code Samples from Neural Networks for NLP
seungwonpark/melgan — MelGAN vocoder (compatible with NVIDIA/tacotron2)
‘Déanaim iarracht ‘rothar’ a rá in áit ‘badhsacal’ – tuairim chonspóideach do Chois Fhairrgeach…’
Interface Between Phonology and Phonetics
Unsupervised Question Answering
Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples
allenai/allennlp — An open-source NLP research library, built on PyTorch.
allenai/allennlp-semparse — A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP
DaCy: New Fast and Efficient State-of-the-Art in Danish NLP!
Nuclear accents in four Irish (Gaelic) dialects
Development of an automatic attitude recognition system: a multimodal analysis of video blogs
The phonetics and phonology of the intonation of Irish dialects
google-research/nisaba — Finite-state script normalization and processing utilities
Using alignments from Montreal Forced Aligner to train
alberto-poncelas/tesseract_postprocess
Tonal alignment in three varieties of Hiberno-English
Modelling intonation in three Irish dialects
Peak timing in two dialects of Connaught Irish
A Linguistically Motivated Computational Framework for Irish Sign Language
Maidir le Croidhe Cainnte Chiarraighe
Helsinki-NLP/Tatoeba-Challenge
zhao-shuyang/childrenize — Signal processing method to convert adult speech into child-like
Learnable latent embeddings for joint behavioural and neural analysis
in progress list for Project Gutenberg
Facebook & Google’s LazyTensor Enables Expressive Domain-Specific Compilers
The Dialects of Co. Clare, Part 1
facebookresearch/vissl — VISSL is FAIR’s library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
google-research/simclr — SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
Leveraging the Exact Likelihood of Deep Latent Variable Models
Transformers Explained Visually part 2
salesforce/WikiSQL — A large annotated semantic parsing corpus for developing natural language interfaces.
bentrevett/pytorch-seq2seq — Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Essential sources for Irish dialect study II: Doegen
The Irish Language in Rathlin Island
salesforce/apollo — An experimental multi-tenant distributed system platform
salesforce/TransmogrifAI — TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
salesforce/decaNLP — The Natural Language Decathlon: A Multitask Challenge for NLP
salesforce/TabularSemanticParsing — Translating natural language questions to a structured query language
salesforce/ai-economist — Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcement learning to learn optimal economic policies, as done by the AI Economist (https://www.einstein.ai/the-ai-economist).
Books in Seanchló / Cló Gaelach
aistear — Suíomh áiseanna d’aistritheoirí, d’eagarthóirí agus do gach duine a bhíonn ag scríobh i nGaeilge.
Ba mhaith liom ‘a thuiscint’ cén fáth a bhfuil an ghramadach chomh deacair sin
D2Go brings Detectron2 to mobile, facebookresearch/d2go — D2Go is a toolkit for efficient deep learning
Ropucha: fadedpage, Wikiźródła
Apes, psychos, alcos: How British cartoonists depict the Irish
Keating’s general history of Ireland
Irish Language, 1700-1999 — Selection of books and manuscripts written in Irish.
Dánta aṁráin, is caointe Ṡeaṫrúin Céitinn
Cuchulain of Muirthemne sacred texts
Parliamentary Papers, Proceedings and Departmental Papers : UK: Ireland
syegulalp/Akilang — A compiler for a simple language, built with Python and LLVM
lark-parser/lark — Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
numba/llvmlite — A lightweight LLVM python binding for writing JIT compilers
libcpu/libcpu — “libcpu” is an open source library that emulates several CPU architectures
apple/ml-qrecc — Open-Domain Question Answering Goes Conversational via Question Rewriting
bbc/bbcrd-brirs — An impulse response dataset for dynamic data-based auralisation of advanced sound systems
sofacoustics/SOFAtoolbox — SOFA Toolbox (API for Matlab, Octave)
aligner 0.1.6 — Automatically corrects subtitle timings given a second correct subtitle, github
Cochleagram Representation of Sound
SpeechColab/GigaSpeech — Large, modern dataset for speech recognition
iver56/audiomentations — A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
The Learning Rate Finder Technique: How Reliable Is It?
Unsupervised pretraining transfers well across languages
A full statement of the trial and acquittal of Aaron Burr, esq
The Irish landed gentry when Cromwell came to Ireland
Jócleabhar beag bídeach na Gaeilge
Countering the claims about Australia’s Aboriginal number systems
Gryf : pismo dla spraw kaszubskich
AIdeaLab/wav2vec2_docker — pretraining wav2vec docker for sagemaker.
cpierse/wav2vec2-large-xlsr-53-irish
ashubham/CPT — Compact prediction trees for fast sequence prediction using Machine Learning
Residual Energy-Based Models for End-to-End Speech Recognition
julien-c/DPRNNTasNet-ks16_WHAM_sepclean
Fine-tuning a model on a translation task
Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models
OCR with Keras, TensorFlow, and Deep Learning
snapthat/TF-T5-text-to-text — This repository demonstrate training T5 transformers using tensorflow 2
PiotrDabkowski/Js2Py — JavaScript to Python Translator & JavaScript interpreter written in 100% pure Python
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition, code
Distilling Zero Shot Classification.ipynb
amzn/xfer — Transfer Learning library for Deep Neural Networks.
asteroid-team/asteroid — The PyTorch-based audio source separation toolkit for researchers
astanin/python-tabulate — Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.
Trankit, nlp-uoregon/trankit — Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
coqui-ai/STT — STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
NVIDIA/mellotron — Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
MycroftAI/mimic2 — Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
MycroftAI/lingua-franca — Mycroft’s multilingual text parsing and formatting library
MycroftAI/skill-date-time — Mycroft AI official Date and Time Skill, providing the current time, date and day of week for cities around the world.
Jaco-Assistant/Scribosermo Train fast Speech-to-Text networks in different languages
grammarly/ua-gec — UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
grammarly/gector — Official implementation of the papers “GECToR – Grammatical Error Correction: Tag, Not Rewrite” (BEA-20) and “Text Simplification by Tagging” (BEA-21)
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
Digital-Umuganda/Deepspeech-Kinyarwanda — The kinyarwanda model for deepspeech
End-to-End Speaker-Attributed ASR with Transformer
Differentiable Weighted Finite-State Transducers
facebookresearch/mmf — A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
kaegi/alass — “Automatic Language-Agnostic Subtitle Synchronization”
pums974/srtsync — Automatic synchronizer of subtitles based on voice activity in the video
oseiskar/autosubsync — Automatically synchronize subtitles with audio using machine learning
tympanix/subsync — Synchronize your subtitles using machine learning
CCExtractor/Subtitle-Resync — A tool to automatically generate in-sync subtitles of different versions of the same base media (such as with edits)
sc0ty/subsync — Subtitle Speech Synchronizer
Getting Started With Embeddings
This past week I spent some time learning about SentenceTransformers (https://t.co/5ZAV7lJq7u), and I'm pretty blown away by what sentence embeddings can be used for.
— Nima Boscarino (@NimaBoscarino) June 10, 2022
If you're curious to see what researchers have been getting up to with it, here's a 🧵 with some highlights:
Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
High-Quality, Robust and Responsible Direct Speech-to-Speech Translation
Introducing CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
milvus-io/milvus — A cloud-native vector database, storage for next generation AI applications
Siri can’t speak Irish: Tackling the digital gaps for the Irish language
lucidrains/reformer-pytorch — Reformer, the efficient Transformer, in Pytorch
BBC reporter Phil McCann triggers ‘fill my can’ memes as he covers fuel shortage in UK
NodLabs/mlir-examples — a simple end to end example of taking a ML graph (TF2 / PyTorch) and running it on a device [cpu, gpu]
Machine Learning Simplified: A gentle introduction to supervised learning
De vandrande djäknarne, De vandrande djäknarne / 3
Beyond Graph Neural Networks with PyNeuraLogic
TorchStudio/torchstudio — IDE for PyTorch and its ecosystem
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, code
clam004/intro_continual_learning — This is a tutorial to connect the fundamental mathematics to a practical implementation addressing the continual learning problem of artificial intelligence
Accent-VITS:accent transfer for end-to-end TTS
Structured Log Linear Models for Noise Robust Speech Recognition
Let the Script Find Out the ML Model that Outperforms Yours
Neural music instrument cloning from very few samples
microsoft/Swin-Transformer — This is an official implementation for “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”.
facebookresearch/xcit — Official code Cross-Covariance Image Transformer (XCiT)
Patches Are All You Need?, locuslab/convmixer — Implementation of ConvMixer for “Patches Are All You Need?”
Direct multimodal few-shot learning of speech and images
yoav-lavi/melody — Melody is a language that compiles to regular expressions and aims to be more readable and maintainable
Boosting Wav2Vec2 with n-grams in 🤗 Transformers
Weakly Supervised Construction of ASR Systems with Massive Video Data
Data Augmentation library for text
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
JAX Vs TensorFlow Vs PyTorch: A Comparative Analysis
FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR
google-deepmind/dm-haiku — JAX-based neural network library
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Joint Speech Recognition and Audio Captioning, code
Turning a Google Colab Notebook into a Web App
Common Mistakes in Hyper-Parameters Tuning
A Large-Scale Study on Regularization and Normalization in GANs
Psychophysical and behavioral peripheral and central auditory tests
Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
One model for the learning of language
unfs3/unfs3 — UNFS3 is a user-space implementation of the NFSv3 server specification.
Sinkformers: Transformers with Doubly Stochastic Attention
descriptinc/lyrebird-wav2clip — Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
Understanding Q,K,V In Transformer
Billion-scale vector search with Vespa - part one
Scaling Vision with Sparse Mixture of Experts, google-research/vmoe
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Neural edit-tree lemmatization for spaCy
KristiyanVachev/Leaf-Question-Generation — Easy to use and understand multiple-choice question generation algorithm using T5 Transformers.
On visualizing phonetic data from repeated measures experiments with multiple random effects
Sohcahtoa: Sine, Cosine, Tangent
Strange and forgotten consoles
Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
UniversalDependencies/UD_Irish-IDT
bbc/peaks.js — JavaScript UI component for interacting with audio waveforms
bbc/waveform-data.js — Audio Waveform Data Manipulation API – resample, offset and segment waveform data in JavaScript.
frictionlessdata/frictionless-py — Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
smacke/ffsubsync — Automagically synchronize subtitles with video.
bbc/morty-docs — Generate a static website from markdown files
BorisChumichev/everpolate — Numerical interpolation and extrapolation lib
19 entities for 104 languages: A new era of NER with the DeepPavlov multilingual BERT
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
LEAF: A Learnable Frontend for Audio Classification
“Chain-linking” NLP tasks With Wav2Vec2 & Transformers
guanlongzhao/kaldi-gop — Computes the Goodness of Pronunciation (GOP). Bases on Kaldi.
An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
Speech and Language Processing
AdapterHub: A Framework for Adapting Transformers
Neighbours across the sea: A brief history of Anglo-Irish relations
NN-SVG — Publication-ready NN-architecture schematics
HarisIqbal88/PlotNeuralNet — Latex code for making neural networks diagrams
lutzroeder/netron — Visualizer for neural network, deep learning and machine learning models
pettarin/forced-alignment-tools — A collection of links and notes on forced alignment tools
chrisbaume/overtyper — Experiment in automatic insertion of timed transcript corrections using fuzzy phonetic matching
bbc/dialogger — Text-based media editing interface
chrisbaume/webaligner — A client-side forced aligner for speech
bbc/stt-align-node — node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R&D.
bbc/react-transcript-editor — A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
bbc/vc2hqencode — Optimised VC-2 HQ Profile Encoder Library
bbc/vc2_conformance — Software tools for checking the conformance of SMPTE ST 2042-1 (VC-2) professional video codec implementations.
bbc/vc2-reference — A reference encoder and decoder for SMPTE ST 2042-1 “VC-2 Video Compression”
bbc/dash.js — A reference client implementation for the playback of MPEG DASH via Javascript and compliant browsers.
bbc/storyplayer — BBC Research & Development’s Object Based Media Player
bbc/digital-paper-edit-api — Work in progress - BBC News Labs digital paper edit project - Express server API
mozilla/DSAlign — DeepSpeech based forced alignment tool
Trials and Tribulations: Using Keras on Colab and TPU
Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training
Open Science in phonetics and phonology
Part 2 - Extracting Audio Features
CNNDigitReco-speakerindependent
Spanish Automatic Speech Recognition pytorch
Grid Search to find best tuning parameters
asahi417/tner — Language model fine-tuning on NER with an easy interface and cross-domain evaluation. “T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021”
Classification on FSDD using Spectograms
JaidedAI/EasyOCR — Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia
eric-mitchell/direct-preference-optimization — Reference implementation for DPO (Direct Preference Optimization)
tiangolo/fastapi — FastAPI framework, high performance, easy to learn, fast to code, ready for production
explosion/wasabi — A lightweight console printing and formatting toolkit
lucidrains/DALLE-pytorch — Implementation / replication of DALL-E, OpenAI’s Text to Image Transformer, in Pytorch
facebookresearch/pytorchvideo — A deep learning library for video understanding research.
microg/GmsCore — Free implementation of Play Services
mortennobel/cpp-cheatsheet — Modern C++ Cheatsheet
micknoise/Maximilian — C++ Audio and Music DSP Library
taywee/args — A simple header-only C++ argument parser library. Supposed to be flexible and powerful, and attempts to be compatible with the functionality of the Python standard argparse library (though not necessarily the API).
antirez/linenoise — A small self-contained alternative to readline and libedit
p-ranav/tabulate — Table Maker for Modern C++
photonstorm/phaser — Phaser is a fun, free and fast 2D game framework for making HTML5 games for desktop and mobile web browsers, supporting Canvas and WebGL rendering.
Suyash458/WiktionaryParser — A Python Wiktionary Parser
ucb-bar/riscv-sodor — educational microarchitectures for risc-v isa
boriel/zxbasic — The Sinclair ZX Spectrum BASIC compiler!
erikrose/blessings — A thin, practical wrapper around terminal capabilities in Python
tartley/colorama — Simple cross-platform colored terminal text in Python
jzhang38/TinyLlama — The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
PrefectHQ/marvin — Build AI interfaces that spark joy
alibaba/animate-anything — Fine-Grained Open Domain Image Animation with Motion Guidance
Kanaries/pygwalker — PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
commaai/openpilot — openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.
dvmazur/mixtral-offloading — Run Mixtral-8x7B models in Colab or consumer desktops
Textualize/rich — Rich is a Python library for rich text and beautiful formatting in the terminal.
DLYuanGod/TinyGPT-V — TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
PaddlePaddle/PaddleGAN — PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.
koaning/drawdata — Draw datasets from within Jupyter.
jindrapetrik/jpexs-decompiler — JPEXS Free Flash Decompiler
topjohnwu/Magisk — The Magic Mask for Android
gzc/CLRS — olutions to Introduction to Algorithms
libcpr/cpr — C++ Requests: Curl for People, a spiritual port of Python Requests.
fffaraz/awesome-cpp — A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-… stuff.
martinmoene/ring-span-lite — ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library
autodiff/autodiff — automatic differentiation made easier for C++
linebender/druid — A data-first Rust-native UI design toolkit.
linebender/runebender — A font editor written in Rust.
pemistahl/grex — A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases
emilk/egui — egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native
actix/actix-web — Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.
chipsalliance/chisel — Chisel: A Modern Hardware Design Language
ucb-bar/dsptools — A Library of Chisel3 Tools for Digital Signal Processing
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models, code
Streamlit vs. Dash vs. Shiny vs. Voila vs. Flask vs. Jupyter
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
Mallard BASIC: Introduction and Reference
Joyce Computer Club Public Domain - BASIC
m-wiesner/nnet_pytorch — Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.
32-bit Apps in a 64-bit Docker Container
marytts/gradle-marytts-voicebuilding-plugin
pyparsing/pyparsing — Python library for creating PEG parsers
IS2AI/Kazakh_TTS — An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In KazakhTTS2, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified.
coady/lupyne — Pythonic search engine based on PyLucene.
Nine Polish books you must read before you die
ml-tooling/opyrator — Turns your machine learning code into microservices with web API, interactive GUI, and more.
tomstitt/lupyter — A Lua Kernel for Jupyter built on ipykernel.
Automated Guitar Transcription with Deep Learning
GuitarsAI/ADSP_Tutorials — Advanced Signal Processing Notebooks and Tutorials
GuitarML/GuitarLSTM — Deep learning models for guitar amp/pedal emulation using LSTM with Keras.
GuitarML/SmartAmpPro — Guitar plugin using neural networks to capture real amps and pedals
voila-dashboards/voila — Voilà turns Jupyter notebooks into standalone web applications
jupyter-xeus/xeus-cling — Jupyter kernel for the C++ programming language
PyO3/pyo3 — Rust bindings for the Python interpreter
Interactive Rust in a REPL and Jupyter Notebook with EVCXR
AK391/spleeter — Deezer source separation library including pretrained models.
CoderLine/alphaTab — alphaTab is a cross platform music notation and guitar tablature rendering library.
dpilger26/NumCpp — C++ implementation of the Python Numpy library
faridrashidi/kaggle-solutions — Collection of Kaggle Solutions and Ideas
Introduction to Sound Event Detection
Podòdzél jistników na deklinacje. I deklinacjô
Open-Speech-EkStep/vakyansh-wav2vec2-experimentation
Machine Learning - Google for Developers
10 Jupyter Notebook Extensions Making My Lyfe Easier
org-arl/jupyter-ieee-paper — Jupyter notebook to generate fully formatted IEEE papers
jupyterlab/jupyterlab-latex — JupyterLab extension for live editing of LaTeX documents
Fission, fission/fission — Fast and Simple Serverless Functions for Kubernetes
jik876/hifi-gan — HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
andabi/deep-voice-conversion — Deep neural networks for voice conversion (voice style transfer) in Tensorflow
chriskiehl/Gooey — Turn (almost) any Python command line program into a full GUI application with one line
googlecreativelab/quickdraw-dataset — Documentation on how to access and use the Quick, Draw! Dataset.
ARBML/klaam — Arabic speech recognition, classification and text-to-speech.
Semi-supervised Learning and Frame Rate
amperser/proselint — A linter for prose.
openstack/swift — OpenStack Storage (Swift). Mirror of code maintained at opendev.org.
Semi-Supervised Training of Deep Neural Networks for Speech Recognition
Zero-Resource Neural Machine Translation with Monolingual Pivot Data
google/python-fire — Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
ceph/ceph — Ceph is a distributed object, block, and file storage platform
rook/rook — Storage Orchestration for Kubernetes
TigerBot: An Open Multilingual Multitask LLM
RLIF: Interactive Imitation Learning as Reinforcement Learning
Training tiny specialized language models
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
lucidrains/MEGABYTE-pytorch — Implementation of MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
1SPU: 1-step Speech Processing Unit
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, code
sscardapane/reprodl2021 — Host repository for the “Reproducible Deep Learning” PhD course
Aligning Ground Truth Text with OCR Degraded Text
openaudible/openaudible — Audiobook Manager for Audible Users
Ki6an/fastT5 — boost inference speed of T5 models by 5x & reduce the model size by 3x.
FieldDB/AndroidLanguageLessons
Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms
Bootstrap your own latent: A new approach to self-supervised Learning
jmccrae/irish_saffron — Code related to adapting Saffron to Irish
A new open data set for multilingual speech research, OpenSLR
tugstugi/dl-colab-notebooks — Try out deep learning models online on Google Colab
FELIX: Flexible Text Editing Through Tagging and Insertion, code
openmainframeproject/cobol-programming-course — Training materials and labs for a “Getting Started” level course on COBOL
mohaEs/Train-Predict-Landmarks-by-dlib
Involution: Inverting the Inherence of Convolution for Visual Recognition, code, involution_pytorch
open-mmlab/mmocr — OpenMMLab Text Detection, Recognition and Understanding Toolbox
jakevdp/PythonDataScienceHandbook — Python Data Science Handbook: full text in Jupyter Notebooks
An ultrasound study of Connemara Irish palatalization and velarization
An Ultrasound Investigation of Irish Palatalization
wmcnally/evopose2d — EvoPose2D is a two-stage human pose estimation model that was designed using neuroevolution. It achieves state-of-the-art accuracy on COCO.
cdpierse/transformers-interpret — Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
A dictionary of the Manks language
Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences, code
ikekonglp/PAD — The PAD parser produces phrases-after-dependencies. Give it the output of a dependency parser and it will produce the optimal constrained phrase-structure parse.
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Neural HMMs are all you need (for high-quality attention-free TTS)
Measuring Massive Multitask Language Understanding, code
qdrant/qdrant — Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Learning rule-based morpho-phonology, code
AakashKumarNain/annotated_research_papers
argilla-io/argilla — Argilla: the open-source feedback platform for LLMs
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring
CCExtractor/ccextractor — CCExtractor is a tool used to produce subtitles for TV recordings from almost anywhere in the world. We intend to keep up with all sources and formats.
JabRef/jabref — Graphical Java application for managing BibTeX and biblatex (.bib) databases
rizinorg/rizin — UNIX-like reverse engineering framework and command-line toolset.
Synfig, code — Synfig Studio is a free and open-source 2D animation software, designed as powerful industrial-strength solution for creating film-quality animation using a vector and bitmap artwork
hamelsmu/Seq2Seq_Tutorial — Code For Medium Article “How To Create Data Products That Are Magical Using Sequence-to-Sequence Models”
hamelsmu/Docker_Tutorial — Code and helper scripts for article on Medium “How Docker Can Help You Become A More Effective Data Scientist”
How to write academic papers in Markdown
Writing academic papers in plain text with Markdown and Jupyter notebook
RasaHQ/paraphraser — Tool to generate paraphrases of sentences in many languages.
Diffusion Models Beat GANs on Image Synthesis
JoFrhwld/FAVE — A repository for maintaing the fave-align and fave-extract toolkits
vowel – Draw vowel charts for phonetic research
Cad a dhéanfaidh mé le mo fhleiscín-se? Comhairle ghramadaí…
From Notebook to Kubeflow Pipelines with MiniKF and Kale
pachyderm/pachyderm — Data-Centric Pipelines and Data Versioning
KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora
Neargye/magic_enum — Static reflection for enums (to string, from string, iteration) for modern C++, work with any enum type without any macro or boilerplate code
FNet: Mixing Tokens with Fourier Transforms, tensorflow, pytorch
Distributed Training of a Bengali ALBERT model
jgraph/drawio — draw.io is a JavaScript, client-side editor for general diagramming.
evolus/pencil — The Pencil Project’s unique mission is to build a free and opensource tool for making diagrams and GUI prototyping that everyone can use.
linkedin/greykite — A flexible, intuitive and fast forecasting library
staltz/matrixmultiplication.xyz — An interactive matrix multiplication calculator for educational purposes
trekhleb/homemade-machine-learning — Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
launchbadge/sqlx — The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.
gfx-rs/wgpu — Cross-platform, safe, pure-rust graphics api.
seanmonstar/warp — A super-easy, composable, web server framework for warp speeds.
Nukesor/pueue — Manage your shell commands.
pytorch/captum — Model interpretability and understanding for PyTorch
Emotion Recognition in Greek Speech Using Wav2Vec 2.0
synesthesiam/mycroft-precise-trainer — Text to speech wake word training scripts for Mycroft Precise
rhasspy/rhasspy-asr — Shared Python classes for speech to text
synesthesiam/voice2json — Command-line tools for speech and intent recognition on Linux
LARP: Language-Agent Role Play for Open-World Games
Continvvm/continuum — A clean and simple data loading library for Continual Learning
CC-100: Monolingual Datasets from Web Crawl Data
deepset-ai/haystack — LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots
janusgraph/janusgraph — JanusGraph: an open-source, distributed graph database
Streamlit Tutorial: A Beginner’s Guide to Building Machine Learning-Based Web Applications in Python
textext/textext — Re-editable LaTeX/ typst graphics for Inkscape
Searching, fast and slow, through product catalogs
How to Easily Draw Neural Network Architecture Diagrams
mlflow/mlflow — Open source platform for the machine learning lifecycle
Why use Docker containers for machine learning development?
Nine Tools I Wish I Mastered before My PhD in Machine Learning
Common Rust Lifetime Misconceptions
rhasspy/gruut — A tokenizer, text cleaner, and phonemizer for many human languages.
rhasspy/ipa2kaldi — Tool for creating Kaldi nnet3 recipes using the International Phonetic Alphabet (IPA)
rhasspy/wiktionary2dict — Tool for extracting IPA pronunciations from Wiktionary XML dump
nodejs/nan — Native Abstractions for Node.js
Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?
Modifying Custom Matmul CUDA Kernels
DeMoriarty/TorchPQ — Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
Alexander-H-Liu/NPC — Non-Autoregressive Predictive Coding
facebookresearch/CPC_audio — An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
as-ideas/TransformerTTS — Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
dfm/extending-jax — Extending JAX with custom C++ and CUDA code
SE and ASR joint training #3226
voicesauce/opensauce-python — Voice analysis software (Python port of VoiceSauce)
rish-16/aft-pytorch — Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.
yoshitomo-matsubara/torchdistill — A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆20 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.
Declassified Cold War code-breaking manual has lessons for solving ‘impossible’ puzzles
epfml/Bi-sent2vec — Robust Cross-lingual Embeddings from Parallel Sentences
MERLOT: Multimodal Neural Script Knowledge Models
AnthonyCalandra/modern-cpp-features — A cheatsheet of modern C++ language and library features.
MayaPosch/NymphCast — Audio and video casting system with support for custom applications.
How Fighter Jets Lock On (and How the Targets Know)
‘Operation Legacy’: Britain’s Destruction and Concealment of Colonial Records Worldwide
gopherdata/gophernotes — The Go kernel for Jupyter notebooks and nteract.
Python and Go : Part II - Extending Python With Go
LANDrop, code — Drop any files to any devices on your LAN.
jpalardy/vim-slime — A vim plugin to give you some slime. (Emacs)
jlevy/the-art-of-command-line — Master the command line, in one page
EbookFoundation/free-programming-books
Neural Machine Translation Using Sequence to Sequence Model
Generative Spoken Language Modeling from Raw Audio
Quechua Collection of Patricia Dreidemie
mvcisback/lstar — Python implementation of lstar automata learning algorithm.
gbossert/pylstar — An implementation of the LSTAR Grammatical Inference Algorithm
lorisdanto/symbolicautomata — Library for symbolic automata and symbolic visibly pushdown automata
awni/transducer — A Fast Sequence Transducer Implementation with PyTorch Bindings
Sequence Transduction with Recurrent Neural Networks
Sequence-to-sequence learning with Transducers
tech-srl/RNN_to_PRS_CFG — Implementation of TACAS 2021 paper, “Extrapolating CFGs from RNNs”
ByT5: Towards a token-free future with pre-trained byte-to-byte models, code
facebookresearch/AugLy — A data augmentations library for audio, image, text, and video.
PrithivirajDamodaran/Styleformer — A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Contrastive Semi-supervised Learning for ASR
Contrastive Learning of General-Purpose Audio Representations, code
FRILL: On-Device Speech Representations using TensorFlow-Lite
parlance/ctcdecode — PyTorch CTC Decoder bindings
kensho-technologies/pyctcdecode — A fast and lightweight python-based CTC beam search decoder for speech recognition.
pariajm/awesome-disfluency-detection — A curated list of awesome disfluency detection publications along with the released code and bibliographical information
How to use the pre-trained Librispeech model in Kaldi
yandex-research/DeDLOC — Official code for “Distributed Deep Learning in Open Collaborations” (NeurIPS 2021)
Distributed Deep Learning in Open Collaborations
cross-language-cpp/djinni-generator — Command-line tool that generates gluecode from a djinni-IDL file
Calling Go Functions from Other Languages
Introduction To Golang For Python developers
jwieting/paraphrastic-representations-at-scale
python-trio/trio — Trio – a friendly Python library for async concurrency and I/O
Making Web Crawlers Using Scrapy for Python
google-research/deeplab2 — DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.
hugapi/hug — Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
Using Syllables as Acoustic Units for Spontaneous Speech Recognition
akreal/diphones — PocketSphinx diphone alignment
Diphone-based speech recognition using neural networks
Speech recognition method and system using triphones, diphones, and phonemes
Face recognition with OpenCV, Python, and deep learning
Keras: Few-Shot learning with Reptile, Image similarity estimation using a Siamese Network with a triplet loss, Self-supervised contrastive learning with SimSiam, Automatic Speech Recognition with Transformer, Code examples
tuplex/tuplex — Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
Semi-Supervised Speech Recognition via Graph-based Temporal Classification
How Kurt Cobain’s Favorite Novel Made Its Way Onto Nirvana’s Final Album
aws/graph-notebook — Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.
o3de/o3de — Open 3D Engine (O3DE) is an Apache 2.0-licensed multi-platform 3D engine that enables developers and content creators to build AAA games, cinema-quality 3D worlds, and high-fidelity simulations without any fees or commercial obligations.
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio, SpeechColab/GigaSpeech — Large, modern dataset for speech recognition
LLM Training: RLHF and Its Alternatives
Sean-Chainnt na gCruach, Co. Dhún na nGall
Can Fully Connected Layers be Replaced by Convolutional Layers?
tencent-ailab/pika — a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
athena-team/athena — an open-source implementation of sequence-to-sequence based speech processing engine
Unitnet Speech Demos: Unit Selection TTS strikes back
chapter09_part01_image-segmentation.ipynb
Google DeepMind’s new AI tool helped create more than 700 new materials
UI researchers working to make speech-recognition technology more accessible
feast-dev/feast — Feature Store for Machine Learning
eugeneyan/applied-ml — Papers & tech blogs by companies sharing their work on data science & machine learning in production.
ucam-smt/ucam-smt — Cambridge SMT System
Welcome to the Zero to Mastery TensorFlow for Deep Learning Book
Neural Networks and Deep Learning
a2-4am/a2rchery — A multi-purpose tool for manipulating .a2r disk images
microsoft/flow2dts — Flow declarations to TypeScript declarations transpiler
shivammehta25/Matcha-TTS — [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
microsoft/terminal — The new Windows Terminal and the original Windows console host, all in the same place!
uwol/proleap-vb6-parser — ProLeap ANTLR4-based parser for Visual Basic 6.0
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Multistream TDNN and new Vosk model
tunib-ai/parallelformers — Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
CPrAN — The plugin manager for Praat
‘Our sound man had Kurt Cobain against the wall’: iconic Leeds gig pub ‘reopens’
Deep Learning over the Internet: Training Language Models Collaboratively
liuliu/ccv — C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
Official Secrets Act reform could target journalists exposing state failings in Troubles’ killings
SUPERB: Speech processing Universal PERformance Benchmark
CS224S, Assignment 3: Deep Learning for End-to-End Speech Recognition
Scary Phonetics? Learning Cardinal Vowels, Part 1
How to combine multiple criterions to a loss function? - PyTorch Forums
yl4579/StarGANv2-VC — StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Active learning in speech recognition
facebookresearch/fairo — A modular embodied agent architecture and platform for building embodied agents
apache/tvm — Open deep learning compiler stack for cpu, gpu and specialized accelerators
hora-search/hora — efficient approximate nearest neighbor search algorithm collections library written in Rust
An Introduction to Weighted Automata in Machine Learning
facebookresearch/SlowFast — PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
sarulab-speech/jtubespeech — JTubeSpeech: Corpus of Japanese speech collected from YouTube
anishathalye/neural-hash-collider — Preimage attack against NeuralHash 💣
AsuharietYgvar/AppleNeuralHash2ONNX — Convert Apple NeuralHash model for CSAM Detection to ONNX.
KhaosT/nhcalc — Compute NeuralHash for the given image
The Formula For An Episode Of Murder, She Wrote
artyom-beilis/dlprimitives — Deep Learning Primitives and Mini-Framework for OpenCL
labmlai/annotated_deep_learning_paper_implementations
Enhancing audio quality for expressive Neural Text-to-Speech
One TTS Alignment to Rule Them All
respeecher/librispeech-cutter — Scripts for generating librispeech cuts from the original mp3 archive without 16kHz restrictions
stanford-crfm/mistral — Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.
Language model training examples
A flexible, open-source platform for democratised access to digital resources
Tencent/TNN — deep learning inference framework for mobile、desktop and server.
rawpython/remi — Python REMote Interface library. Platform independent. In about 100 Kbytes, perfect for your diet.
microsoft/Focal-Transformer — [NeurIPS 2021 Spotlight] Official code for “Focal Self-attention for Local-Global Interactions in Vision Transformers”
microsoft/Swin-Transformer — This is an official implementation for “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”.
ahrm/sioyek — Sioyek is a PDF viewer with a focus on textbooks and research papers
How lies about Irish ‘barbarism’ in 1641 paved way for Cromwell’s atrocities
microsoft/UniSpeech — UniSpeech - Large Scale Self-Supervised Learning for Speech
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
Generative Spoken Language Modeling from Raw Audio
NVIDIA/radtts — Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
AI Choreographer. Music Conditioned 3D Dance Generation with AIST++, paper, dataset, api, model
Appen/UHV-OTS-Speech — A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
EgorLakomkin/KTSpeechCrawler — Automatically constructing corpus for automatic speech recognition from YouTube videos
gong-io/gecko — Gecko - A Tool for Effective Annotation of Human Conversations
A Recipe For Arbitrary Text Style Transfer with Large Language Models
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
All Top Python Libraries for Data Science Explained
FedJAX: Federated Learning Simulation with JAX
Appen/UHV-OTS-Speech — A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
striveiccv2021/STRIVE-ICCV2021/ — STRIVE: Scene Text Replacement In Videos
doxas/twigl — twigl.app is an online editor for One tweet shader, with gif generator and sound shader, and broadcast live coding.
giannisdaras/multilingual_robustness — [NeurIPS 2022] Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
s3prl/LibriMix — An open source dataset for source separation
awslabs/speech-representations — Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
RLIF: Interactive Imitation Learning as Reinforcement Learning, paper, code
snorkel-team/snorkel — A system for quickly generating training data with weak supervision
Visualizing and Understanding Convolutional Networks
kedro-org/kedro — Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
How to Train Bert For Q&A in Any Language
Gentle Dive into Math Behind Convolutional Neural Networks
t-ubukata/cudnnxx — cuDNN C++ wrapper.
Smart developers use smart pointers (1/7) – Smart pointers basics
CTC Variations Through New WFST Topologies
Machine Learning Formulas Explained! 👨🏫
— Vlad Haltakov (@haltakov) October 13, 2021
This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.
It may look super confusing, but I promise you that it is actually quite simple!
Let's go step by step 👇 pic.twitter.com/LcQofbUJnl
BaguaSys/bagua — Bagua Speeds up PyTorch
visenger/awesome-mlops — A curated list of references for MLOps
Towards Robust Waveform-Based Acoustic Models
A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
NormFormer: Improved Transformer Pretraining with Extra Normalization
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Fine-tuning for Audio Classification with 🤗 Transformers
Control Strategies for Physically Simulated Characters Performing Two-player Competitive Sports
Hierarchical Skills for Efficient Exploration, code, paper
facebookresearch/ppuda — Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)
hankcs/HanLP — Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
How to write a DSL (in Python with Lark)
rpgleparser/rpgleparser — ANTLR parser for RPGLE
smeup/jariko — a JAva virtual machine Rpg Interpreter written in KOtlin
sallbach/arpgtool — IBM i RPG developer tools (AS/400 iSeries)
worksofliam/5250ttt — Tic-Tac-Toe for 5250 (2 player)
lppedd/RPG — IBM RPG projects
martinezga/ibm-rpg-programs — IBM RPG programs exercises.
How to write a transpiler, code
Strumenta/FormatsDSL — A DSL to describe formats and generate loaders
CLIPScore: A Reference-free Evaluation Metric for Image Captioning, code
This Word Does Not Exist, turtlesoupy/this-word-does-not-exist
Aim 3.0.0 — The foundations for open-source & open-metadata ML platform
Large Language Models: A New Moore’s Law?
babelfish-for-postgresql/babelfish_extensions — Babelfish for PostgreSQL provides the capability for PostgreSQL to work with applications written for Microsoft SQL Server.
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
uclnlp/torch-imle — Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Hierarchical Transformers Are More Efficient Language Models
Language Modelling via Learning to Rank
Teaching robots to perceive, understand, and interact through touch, tacto, PyTouch
Speculative execution for LLMs is an excellent inference-time optimization.
— Andrej Karpathy (@karpathy) August 31, 2023
It hinges on the following unintuitive observation: forwarding an LLM on a single input token takes about as much time as forwarding an LLM on K input tokens in a batch (for larger K than you might… https://t.co/FiwTwqsfho
jzhang38/TinyLlama — The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
TinyLlama/TinyLlama-1.1B-Chat-v0.4
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training, code
Recent Advances in End-to-End Automatic Speech Recognition
bhky/opennsfw2 — Keras implementation of the Yahoo Open-NSFW model
The Rise of Self-Supervised Learning
TF_JAX_Tutorials - Part 9 (Autodiff in JAX)
Conformer-based Hybrid ASR System for Switchboard Dataset
Hacktoberfest 21’ - Unlocking 40 open-source audio datasets for ML
facebookresearch/demucs — Code for the paper Hybrid Spectrogram and Waveform Source Separation
Scaling ASR Improves Zero and Few Shot Learning
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
CZWin32768/XLM-Align — Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
LiT: Zero-Shot Transfer with Locked-image text Tuning
Wormhole (1994 Session part 2)
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Joint Unsupervised and Supervised Training for Multilingual ASR
Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
British and American English Pronunciation Differences
Studying the History of English
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model
thu-spmi/CAT — A CRF-based ASR Toolkit
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog, code
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition, asappresearch/sew
Towards Learning Universal Audio Representations
Transformer-S2A: Robust and Efficient Speech-to-Animation
Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
Hash Layers For Large Sparse Models
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task, code
g2p_encode.py, fairseq_simul_st_agent.py
facebook/wmt21-dense-24-wide-x-en — WMT 21 X-En is a 4.7B multilingual encoder-decoder (seq-to-seq) model trained for one-to-many multilingual translation. It was introduced in this paper and first released in this repository.
facebook/wav2vec2-large-robust-ft-swbd-300h
facebook/s2t-small-mustc-en-nl-st
facebook/wav2vec2-lv-60-espeak-cv-ft
Simultaneous Speech Translation (SimulST) on MuST-C
MuST-C: a Multilingual Speech Translation Corpus
Simplified Grammar of the Hungarian Language
A Green Approach for an Irish App (Refactor, reuse and keeping it real)
HuBERT: How to Apply BERT to Speech, Visually Explained
Automatic Speech Recognition for Supporting Endangered Language Documentation
“Attention is all you need” implementation from scratch in PyTorch. A Twitter thread
googleforgames/open-match-docs
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
mlcommons/peoples-speech — The People’s Speech Dataset
“Demokratifabriken” även som ljudbok?
srush/GPU-Puzzles — Solve puzzles. Learn CUDA.
Conformer: Convolution-augmented Transformer for Speech Recognition
i'll never get over how the cochlea is an analog fourier transform organ pic.twitter.com/VSlu0oqQXH
— murat 🍥 (@mayfer) January 3, 2024
Room impulse response reconstruction with physics-informed deep learning
GPT using Numpy! 🔥
— Clarifai (@clarifai) January 2, 2024
Here is Generative Pretrained Transformer(GPT) implemented from scratch using Numpy in just 60 lines of code: pic.twitter.com/80dTaDkePe
DocLLM: A layout-aware generative language model for multimodal document understanding
godotengine/godot — Godot Engine – Multi-platform 2D and 3D game engine
The 3 Deep Learning Frameworks For End-to-End Speech Recognition That Power Your Devices
Perceiver IO: a scalable, fully-attentional model that works on any modality
Introduction to Facebook AI Similarity Search (Faiss), Facebook AI and the Index Factory
RichiH/vcsh — vcsh - Version Control System for $HOME
- multiple Git repositories in $HOME
Few-shot Learning with Multilingual Language Models
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
Multi-turn RNN-T for streaming recognition of multi-party speech
1ytic/warp-rna — Recurrent Neural Aligner
theblackcat102/edgedict — Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
1ytic/warp-rnnt — CUDA-Warp RNN-Transducer
awni/automata_ml — An Introduction to Weighted Automata in Machine Learning
awni/speech — A PyTorch Implementation of End-to-End Models for Speech-to-Text
sooftware/kospeech — Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
sooftware/RNN-Transducer — PyTorch implementation of RNN-Transducer(RNN-T).
EdinburghNLP/nematus — Open-Source Neural Machine Translation in Tensorflow
bbc/rd-apmm-python-lib-mediatimestamp — A simple timestamp implementation used by various other libraries
bbc/lrud — Left, Right, Up, Down. A spatial navigation library for devices with input via directional controls.
bbc/grid — BBC’s implementation of The Guardian’s image management system
bbc/digital-paper-edit-client — Work in progress - BBC News Labs digital paper edit project - React Client
bbc/webMUSHRA — a MUSHRA compliant web audio API based experiment software
bbc/programmes-pages-service — A library for accessing ProgrammesDB
bbc/codext — VS Code’s editor shipped as a browser extension.
bbc/clever-thumbnailer — Audio thumbnail generator
bbc/digital-paper-edit-storybook — Work in progress - BBC News Labs digital paper edit project - React storybook
Rhymes of a Rolling Stone/The Cow-Juice Cure
Rhymes of a Red-Cross Man/Missis Moriarty’s Boy
Neural Data Augmentation via Example Extrapolation
Train GPT-2 in your own language
Fit More and Train Faster With ZeRO via DeepSpeed and FairScale
Sample teaching materials, teg.ie
POSH: A Data-Aware Shell for Faster Distributed Text Processing
Retrieval Augmented Generation with Huggingface Transformers and Ray
This Non-Profit is Building the World’s Largest Lexical Translation Database
Text Classification Using DeepPavlov Library With PyTorch And Transformers
Spoken Corpus Linguistics in Romance: thoughts, design and results
kakaobrain/pororo — PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
modernmt/modernmt — Neural Adaptive Machine Translation that adapts to context and learns from corrections.
quic/sense — Enhance your application with the ability to see and interact with humans using any RGB camera.
wenet-e2e/wenet — Production First and Production Ready End-to-End Speech Recognition Toolkit
ray-project/ray — Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Natooz/MidiTok — MIDI / symbolic music tokenizers for Deep Learning models 🎶
Books in Irish - Royal Irish Academy
Bibliography of Irish philology and of printed Irish literature, scan 2
IG01-15, IG01-16, IG01-19, IG01-25, IG02-10049, IG02-60, IG02-11771, IG02-66
Irish Language Forum - Study Group: Séadna
Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
The Irish of Iorras Aithneach, County Galway
Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation
Siamese networks with Keras, TensorFlow, and Deep Learning
Comparing images for similarity using siamese networks, Keras, and TensorFlow
Dense Passage Retrieval for Open-Domain Question Answering
messiaen/full-lattice-search — Full Text Search Over Probabilistic Lattices with Elasticsearch!
steveash/jopenfst — Partial Java port of the C++ OpenFST library
OlliSaarikivi/Automata — Automata and transducer library for .NET
Recreating Historical Streetscapes Using Deep Learning and Crowdsourcing
Ordnance Survey Index to the Map of the Town of Thurles
Old Photos of Thurles Co Tipperary Ireland
Praat on the Web: An Upgrade of Praat for Semi-Automatic Speech Annotation
monikaUPF/PraatontheWeb — Web implementation of Praat. Source code, running demo scripts on web, samples and documentation
ys10/Grapheme-PhonemeAlignment — This project aims to implement a algorithm to do a grapheme-phoneme alignment task.
kfirgoldberg/FUN — Official implementation of the FUN models
jailuthra/asr — Kaldi ASR wrapper scripts
My ELAN workflow for segmenting and transcription
lennes/spect — SpeCT - Speech Corpus Toolkit for Praat. Documentation
praaline/Praaline — Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
CoEDL/elpis — 🙊 software for creating speech recognition models.
Deep Speech : Train Native Languages with Transfer Learning Part #0b01
cohere-ai/natural-instructions — Expanding natural instructions
Bat banter is surprisingly nuanced
Yes you should understand backprop
The Unreasonable Effectiveness of Recurrent Neural Networks
Chrome Extension Programming: Illustrating a Basic Survival Skill with a Twitter Case Study
A Recipe for Training Neural Networks
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
How to Use Image Embeddings for Object Localization
Learning deep features to recognise speech emotion using merged deep CNN
How to Break GPU Memory Boundaries Even with Large Batch Sizes
zh217/torch-dct — DCT (discrete cosine transform) functions for pytorch
inejc/paragraph-vectors — A PyTorch implementation of Paragraph Vectors (doc2vec).
pperle/gaze-tracking — state-of-the-art gaze tracking model
tjysdsg/capt-public — Public version of my Computer-Aided Pronunciation Training (CAPT) system (server)
JawadAr/Pronunciation-verification-using-anomaly-detection-Thesis — This repository contains all the codes used in a thesis at Information Technology University (ITU). The topic of the thesis is pronunciation verification using anomaly detection.
googlefonts/gftools — Misc tools for working with the Google Fonts library
fonttools/fonttools — A library to manipulate font files from Python.
openjournals/joss — The Journal of Open Source Software
LaurentMazare/tch-rs — Rust bindings for the C++ api of PyTorch.
vesis84/kaldi-io-for-python — Python functions for reading kaldi data formats. Useful for rapid prototyping with python.
Peer Review: Implementing a “publish, then review” model of publishing
The Entropy of Words—Learnability and Expressivity across More than 1000 Languages, preprint
Why scientists are turning to Rust
Phone set selection for HMM-based dialect speech synthesis
The On-Device Machine Learning Behind Recorder
Navigating Recorder Transcripts Easily, with Smart Scrolling
How to objectively measure phonetic distance?
CoEDL/elpis — 🙊 software for creating speech recognition models.
Neural circuit policies enabling auditable autonomy
The Challenges of using Transformers in ASR
Segment Anything Meets Point Tracking
SysCV/sam-pt — SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
bbc/subtitles-generator — A node module to generate subtitles by segmenting a list of time-coded text - BBC News Labs
ebu/libbw64 — Broadcast Wave 64 (ITU-R BS.2088) library
bbc/aes31-adl-composer — Work in progress - A node module to convert a json sequence into an AES31 ADL (audio decision list) compatible with SADiE audio editing software. For BBC News Labs digital paper edit project
bbc/audiowaveform — C++ program to generate waveform data and render waveform images from audio files
enochkan/torch-metrics — Metrics for model evaluation in pytorch
formiel/speech-translation — Multilingual speech translation
synesthesiam/sv_kaldi-montreal — Swedish voice2json profile based on Kaldi
mycrazycracy/speaker-embedding-with-phonetic-information — The code for the Interspeech paper “Speaker Embedding Extraction with Phonetic Information”
openXBOW/openXBOW — openXBOW - the Passau Open-Source Crossmodal Bag-of-Words Toolkit
fastai/numerical-linear-algebra — Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
The Journey of Viscount Ramon de Perellós to Saint Patrick’s Purgatory
Mion-ċaint : an easy Irish phrase book
Researches in the South of Ireland
A little bit of Culture… Poetry from soc.culture.irish
An Caoineadh Airt Uí Laoghaire
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
Train LLMs using QLoRA on Amazon SageMaker
apple/swift-algorithms — Commonly used sequence and collection algorithms for Swift
apple/sourcekit-lsp — Language Server Protocol implementation for Swift and C-based languages
apple/darwin-libplatform — Legacy mirror of Darwin Platform Library. Replaced by https://github.com/apple-oss-distributions/libplatform
apple/foundationdb — FoundationDB - the open source, distributed, transactional key-value store
marian-nmt/marian-examples — Examples, tutorials and use cases for Marian, including our WMT-2017/18 baselines.
pswietojanski/ojsp_adaptation_review_2020 — Auxiliary data and scripts for our OJSP review on speaker adaptation for speech recognition
Zeitschrift für celtische Philologie
Review: Oral Literature from Dunquin, County Kerry
A Gentle Introduction to the Huggingface Pipeline
Séamus Ó Duilearga’s Co. Antrim notebooks
BUSZI-2 guided conversations - downloadable transcripts
The Swedish subproject of ScanDiaSyn
Skolt Saami Documentation Corpus
Hungarian Broadcast News Database
Hungarian Kindergarten Language Corpus
Hungarian Medical Speech Database
Tunable Q-factor Wavelet Transform
PyWavelets/pywt — PyWavelets - Wavelet Transforms in Python
jollyjonson/tqwt_tools — Tunable-Q Wavelet Transform and Resonance-based Signal Decomposition Toolkit