Interesting links, 27/12/2023

siddk/voltron-robotics — Voltron: Language-Driven Representation Learning for Robotics

Mamba: Linear-Time Sequence Modeling with Selective State Spaces, state-spaces/mamba

johnma2006/mamba-minimal — Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

nnnoiseless: porting audio code from C to rust, code

ZDisket/TensorVox — Desktop application for neural speech synthesis written in C++

avaneev/r8brain-free-src — High-quality pro audio resampler / sample rate converter C++ library. Very fast, for both audio resampling and time-series interpolation.

castorini/howl — Wake word detection modeling toolkit for Firefox Voice, supporting open datasets like Speech Commands and Common Voice.

stanford-oval/genie-toolkit — The Genie open source kit for voice assistant (formerly known as Almond)

stanford-oval/thingtalk — The Programming Language of Virtual Assistants

salesforce/morpheus — Code for ACL’20 paper “It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations”

Instructions for estimating the location of beats in a soundfile

LSTMs Explained: A Complete, Technically Accurate, Conceptual Guide with Keras

Looking under the tinfoil hat: Clarifying the personological and psychopathological correlates of conspiracy beliefs

Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche

OpenNMT-py BERT Tutorial

In his spare time, an engineer found flaws in the classic book “A Million Random Digits”

QData/TextAttack — About TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

Hear Slayer guitarist Jeff Hanneman’s ferocious unreleased demos for Reign In Blood

gtn-org/gtn — Automatic differentiation with weighted finite-state transducers.

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Domain Adversarial Neural Networks for Dysarthric Speech Recognition

facebookincubator/CG-SQL — CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.

Knowledge Transfer in Self Supervised Learning

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

google/monster-mash — Sketch-Based Modeling and Animation Tool

DEEP LEARNING

Transformer-based Encoder-Decoder Models

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

Overview: State-of-the-Art Machine Learning Algorithms per Discipline & per Task

kmkurn/pytorch-crf

Composition-based on-the-fly rescoring for salient n-gram biasing

Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

Táin Bó Cúalnge

getkeops/keops — KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

The Annotated S4, code

A Comprehensive Overview of Gaussian Splatting

stanfordnlp/dspy — Stanford DSPy: The framework for programming with foundation models

stanford-futuredata/ColBERT — ColBERT: state-of-the-art neural search (SIGIR’20, TACL’21, NeurIPS’21, NAACL’22, CIKM’22)

allenai/Holodeck — Language Guided Generation of 3D Embodied AI Environments.

dolphin-2.5-mixtral-8x7b

LLM360/Amber

Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation

More Hungarian, Kafka crow

MalcolmSlaney/python_auditory_toolbox

Protecting Voice-Controlled Devices against LASER Injection Attacks

Accelerating over 130,000 Hugging Face models with ONNX Runtime

The N Implementation Details of RLHF with PPO

8 Hungarian Novels You Should Read Before You Die

apple/ml-ferret

Deploy Embedding Models with Hugging Face Inference Endpoints

Personal Copilot: Train Your Own Coding Assistant

Calculus Made Easy

Language Model Beats Diffusion – Tokenizer is Key to Visual Generation

Open X-Embodiment: Robotic Learning Datasets and RT-X Models, code

marella/ctransformers — Python bindings for the Transformer models implemented in C/C++ using GGML library.

chronhib-MU/Chronhib-Website — This is the ChronHib website repository.

Plachtaa/VALL-E-X — An open source implementation of Microsoft’s VALL-E X zero-shot TTS model.

suno-ai/bark — Text-Prompted Generative Audio Model

The Project Gutenberg Open Audiobook Collection, code

Ressources for End-to-End French Text-to-Speech Blizzard challenge

Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech

Speaker-independent Speech Inversion for Estimation of Nasalance, code

A System for Generating Voice Source Signals that Implements the Transformed LF-model Parameter Control

Implementing Contextual Biasing in GPU Decoder for Online ASR, idiap/contextual-biasing-on-gpus

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

idiap/bob — Bob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland.

A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers

Vowel reduction by Greek-speaking children: The effect of stress and word length

Mapping Phonemes to Acoustic Symbols and Codes Using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank

NeMo Forced Aligner and its application to word alignment for subtitle generation

A stimulus-organism-response model of willingness to buy from advertising speech using voice quality

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition

Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think

Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR

Cross-lingual Prosody Transfer for Expressive Machine Dubbing

An Analysis of Goodness of Pronunciation for Child Speech

Data augmentation for children ASR and child-adult speaker classification using voice conversion methods

Prefix Search Decoding for RNN Transducers

ddlBoJack/MT4SSL — Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

amogh3892/Audio-classification-using-Bag-of-Frames-approach — Classification of different categories of audio clips, especially non speech sounds using Bag-of-Frames approach.

JournalismAI-2021-Quotes/quote-extraction — Quote extraction for modular journalism (JournalismAI collab 2021)

Nearest Neighbor Machine Translation

Atticus Open Contract Dataset

Clarifying exceptions and visualizing tensor operations in deep learning code

Translation Artifacts in Cross-lingual Transfer Learning

AI Explorables

Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation

Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

meyda/meyda — Audio feature extraction for JavaScript.

adrianbg/kaldi.js — This is a version of Kaldi tweaked to build to WebAssembly.

LSTMs Compose (and Learn) Bottom-Up

Modern Practical Natural Language Processing

Understanding Transformers, the Programming Way

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search, code

Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

antonisa/unimorph_inflect — A python library for easily querying morphological inflection models trained on Unimorph

A Neural Network Playground

ConvnetJS demo

vivjay30/Cone-of-Silence — Speech Separation by Localization

ssnl/dataset-distillation

kermitt2/grobid — A machine learning software for extracting information from scholarly documents

microsoft/nni — An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

microsoft/LabanotationSuite — Microsoft Applied Robotics Research Library: LabanotationSuite - open source software tools to give service robots the ability to perform human-like gestures

lucidrains/mixture-of-experts

TezRomacH/layer-to-layer-pytorch

Cross-lingual Retrieval for Iterative Self-Supervised Training

Pre-training via Paraphrasing

Augmenting Transformers with KNN-Based Composite Memory for Dialogue

REALM: Retrieval-Augmented Language Model Pre-Training, code

Rethinking Attention with Performers

CS231n: Convolutional Neural Networks for Visual Recognition

Over 200 of the Best Machine Learning, NLP, and Python Tutorials — 2018 Edition

KinWaiCheuk/nnAudio — Audio processing by using pytorch 1D convolution network

Bootstrapping Relation Extractors using Syntactic Search by Examples

OpenMonkeyStudio

The Fairy Tales of the Brothers Grimm

Winnie-the-Pooh

An Introduction to Hungarian Literature in 8 books

NLP-progress Dialogue

mermaid-js/mermaid — Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown

The JavaScript library for bespoke data visualization

opal/opal — Opal is a Ruby to JavaScript source-to-source compiler.

linebender/druid — A data-first Rust-native UI design toolkit.

alphacep/vosk-api — Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

ossrs/srs — SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

Dobiasd/frugally-deep — Header-only library for using Keras (TensorFlow) models in C++.

HazyResearch/bootleg — Self-Supervision for Named Entity Disambiguation at the Tail

Composition-based on-the-fly rescoring for salient n-gram biasing

Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

google-research-datasets/RxR — Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual per…

Localized Narratives, code

automerge/automerge — A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.

cyrildiagne/ar-cutpaste

Embeddings from the Ground Up

Parrots learn to make video calls to chat with other parrots, then develop friendships, Northeastern University researchers say

alexa/visitron — VISITRON: A multi-modal Transformer-based model for Cooperative Vision-and-Dialog Navigation (CVDN)

Deep Transformers with Latent Depth

Project Euphonia

Recreating Historical Streetscapes Using Deep Learning and Crowdsourcing

Advanced libtorch

Deep Transformers with Latent Depth

alexa/ramen — A software for transferring pre-trained English models to foreign languages

alexa/Topical-Chat — A dataset containing human-human knowledge-grounded open-domain conversations.

Causal Reasoning in Probability Trees

cdk8s-team/cdk8s — Define Kubernetes native apps and abstractions using object-oriented programming

aware-ai/byt5-german-grammar

ali-vilab/videocomposer — Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

microsoft/MS-SNSD — The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.

MarvinLvn/BabySLM — Behavioral probing of language acquisition models at the lexical and syntactic level

A Complete Logistic Regression Algorithm From Scratch in Python: Step by Step

convert_lm_to_fst.py

microsoft/hummingbird — Hummingbird compiles trained ML models into tensor computation for faster inference.

Noisy speech database for training speech enhancement algorithms and TTS models

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

Converting Jupyter Notebooks into blog posts with Gatsby

YannickJadoul/Parselmouth

kkroening/ffmpeg-python

Interactive spreadsheets in Jupyter

Supervised Pretraining Can Learn In-Context Reinforcement Learning

robodhruv/visualnav-transformer — Official code and checkpoint release for “ViNT: A Foundation Model for Visual Navigation”.

mfaruqui/retrofitting — Retrofitting Word Vectors to Semantic Lexicons

jart/sectorlisp — Bootstrapping LISP in a Boot Sector

blink1073/oct2py — Run M Files from Python - GNU Octave to Python bridge

R language for programmers

scoder/lupa — Lua in Python

Lua for Python Programmers

deepinsight/insightface — State-of-the-art 2D and 3D Face Analysis Project

The importance of fillers for text representations of speech transcripts

Learning Robust and Multilingual Speech Representations

End-to-End Speech Recognition and Disfluency Removal

The role of context in neural pitch accent detection in English

Reconstructing the brain of fruit flies

Sharing Project Amber with the mental health community

mfaruqui/morph-trans — Code for morphological transformations

higgood/incremental-word2vec — Modify word2vec such that it’s possible to “condition” on existing embeddings for some words, and induce embeddings for new words.

Supervised Pretraining Can Learn In-Context Reinforcement Learning

The Power of Scale for Parameter-Efficient Prompt Tuning

shashikg/WhisperS2T — An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Enginer

‘Less Than One’-Shot Learning: Learning N Classes From M<N Samples

Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

Speaker Embedding Extraction with Phonetic Information

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

TensorSpeech/TensorflowTTS — 😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Language Model is All You Need: Natural Language Understanding as Question Answering

Building RNNs is Fun with PyTorch and Google Colab

20 free Irish language audiobooks for children

emijrp/internet-archive

drtoast/flickr-backup

Generalized End-to-End Loss for Speaker Verification

Layout-Parser/layout-parser — A Unified Toolkit for Deep Learning Based Document Image Analysis

OCR for Endangered Language Texts, code

Google Cardboard open sourced as active development on Google VR SDK stops

googlevr/cardboard

KomputeProject/kompute — General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

m3hrdadfi/wiki-summary — A Bert2Bert model which able to summarize articles!

CAMeL-Lab/camel_tools — A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Traditional Versus ASR-Based Pronunciation Instruction, An Empirical Study

ruffle-rs/ruffle — A Flash Player emulator written in Rust

Learning Sparse Prototypes for Text Generation

A Speech-To-Text Practitioner’s Criticisms of Industry and Academia

ANNOYingly Simple Sentence Clustering

Guitarix

How JavaScript Libraries Are Training Neural Networks on Web Browsers

andrenatal/phonetisaurus-emscripten

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection, code

A table detection, cell recognition and text extraction algorithm to convert tables in images to excel files

Finding Syntax with Structural Probes

adefossez/julius — Fast PyTorch based DSP for audio and 1D signals

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

evcxr/evcxr — An evaluation context for Rust.

Rust and WebScraping

Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Pronunciation Variation Modeling for Dutch Automatic Speech Recognition

Booting from a vinyl record

Score-Based Generative Modeling through Stochastic Differential Equations

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators

Hero-Tales of Ireland by Jeremiah Curtin

MycroftAI/lingua-franca — Mycroft’s multilingual text parsing and formatting library

Acoustic event recognition using cochleagram image and convolutional neural networks

CBMM/cochleagram — Cochlear sound spectrum

Speaker-independent vowel recognition: spectrograms versus cochleagrams

A joint training framework for robust automatic speech recognition

Auditory features based on Gammatone filters for robust speech recognition

NN-512 — NN-512 is a compiler that generates C99 code for neural net inference

vakila/de-stress — Prototype German Computer-Assisted Pronunciation Training tool for lexical stress errors

guanpengchn/awesome-pronunciation

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

10 Ways to Optimize Text for Machine Translation

Text2Image: A new way to NLP?

The Nand Game

Learning from Language Explanations

Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks

Computing Receptive Fields of Convolutional Neural Networks

Sequence Modeling With CTC

xinjli/allosaurus — Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

Feature Learning in Infinite-Width Neural Networks

Uncertainty Estimation in Autoregressive Structured Prediction

RNNs can generate bounded hierarchical languages with optimal memory

persephone-tools/persephone — A tool for automatic phoneme transcription

dmort27/allovera — A phoneme-allophone database for many languages

Deploying Part-of-Speech Patterns to Enhance Statistical Phrase-Based Machine Translation Resources

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

The Scientist and Engineer’s Guide to Digital Signal Processing

When regular is not easy: Cracking the code of Irish orthography

giakou4/pyfeats — Open source software for image feature extraction.

Affordances from Human Videos as a Versatile Representation for Robotics

ReadAlongs/Studio — Audiobook alignment for Indigenous languages

ReadAlongs/Web-Component — Suite of web packages for creating interactive ReadAlongs

roedoejet/convertextract — Extract and find/replace text based on arbitrary correspondences while preserving original file formatting. This library is a fork from the Textract library by Dean Malmgren.

Gⁱ-to-Pⁱ Studio

markovka17/dla — Deep learning for audio processing

What is a signal

facebookresearch/CPC_audio — An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

iamjanvijay/rnnt_decoder_cuda — An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

awni/transducer — A Fast Sequence Transducer Implementation with PyTorch Bindings

iceychris/LibreASR — An On-Premises, Streaming Speech Recognition System

How to convert a pre-trained model for Kaldi to Vosk

MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device

English Dialects From the Eighth Century to the Present Day by Walter W. Skeat

Ireland, Historic and Picturesque by Charles Johnston

What’s the Matter with Ireland? by Ruth Russell

A Visit From Saint Nicholas by Clement Clarke Moore

The Most Ancient Lives of Saint Patrick by James O’Leary

Anglo-Saxon Literature by John Earle

The Reminiscences of an Irish Land Agent by Samuel Murray Hussey

Digital ink recognition

Building Custom Deep Learning Based Optical Character Recognition (OCR) models

emedvedev/attention-ocr — A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Retentive Network: A Successor to Transformer for Large Language Models

courao/ocr.pytorch — A pure pytorch implemented ocr project including text detection and recognition

Xilinx/pytorch-ocr — Quantized LSTMs for OCR

DTolm/VkFFT — Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

DTolm/VkResample — Vulkan real-time FFT upscaling

Historical Copyright Records and Transparency

Speech-Lab-IITM/English_ASR_Challenge — English ASR Challenge organized by Speech Lab, IIT Madras

Unsupervised Cross-lingual Representation Learning for Speech Recognition

Example for Clustered Transformers

Speech Recognition with Python

TAPAS base model fine-tuned on WikiTable Questions

What is Similarity Between Sentences?

catalyst-team/dl-course — Deep Learning with Catalyst

facebookresearch/ClassyVision — An end-to-end PyTorch framework for image and video classification

Mel Frequency Cepstral Coefficient (MFCC) tutorial

Claare Ny Gael

Description d’un parler irlandais de Kerry/Texte

Audio samples of Ulster-Scots speakers

Ulster-Scots Education Resources

Lowland Scots

An focal don ainmhí seo → 🐶 i nGaeilge

Character Recognition and Segmentation For Custom Data Using Detectron2

da03/Attention-OCR — Visual Attention based OCR

Comhar

How to train Tesseract 4

Recent Advances in Google Translate

Narrative framing of consumer sentiment in online restaurant reviews

VFsync

TinyEMU

Model Zoo

Training optical character recognition technology Tesseract on a new character font on MacOS

Fine-tuning Tesseract OCR for German Invoices

Training Tesseract on your custom dataset using Qt Box Editor

Simple OCR with Tesseract

Add four additional special unicode characters to tesseract

zdenop/qt-box-editor — QT4 editor of tesseract-ocr box files

IfcOpenShell — The open source IFC toolkit and geometry engine

IFCjs/web-ifc-viewer — Graphics engine and toolkit for client applications.

SketchUp-STL

Self-training and pre-training, understanding the wav2vec series

clovaai/deep-text-recognition-benchmark — Text recognition (optical character recognition) with deep learning methods.

apple/ml-equivariant-neural-rendering — This repo contains code to reproduce all experiments in Equivariant Neural Rendering by E. Dupont, M. A. Bautista, A. Colburn, A. Sankar, C. Guestrin, J. Susskind, Q. Shan, ICML 2020.

PrefectHQ/prefect — Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines

kaituoxu/Conv-TasNet — A PyTorch implementation of Conv-TasNet described in “TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation” with Permutation Invariant Training (PIT).

Månadens profil: Jim O´Regan - Språkbanken

The historical short vowel phonology of Gaelic

Lexicon of Old Irish

The Structure of the Consonant System of the Gaelic of Torr, Co. Donegal

Collins gem Irish dictionary : English-Irish, Irish-English

Visual Speech Enhancement Without A Real Visual Stream, code

joonson/syncnet_python — Out of time: automated lip sync in the wild

High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder

The Grammar of English Grammars

karthiTox/deepnet.js — Auto-differentiation library for javascript

Familiar feud in Poland after game show calls regional language a dialect

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Google’s REALM — A Knowledge-base Augmented Language Model

apple/ml-mkqa — We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper f…

Spoken Wikipedia - Swedish

From Historical Sources to Datasets: A Preview of DataScribe, code

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Reading and Writing RDF in Apache Jena

kba/jsonld-rapper — Create RDF from JSON-LD with rapper

JSON-LD Syntax 1.0

Cad iad na focail Ghaeilge is mó a mbíonn deacracht ag daoine atá líofa sa teanga iad a litriú?
— Eoin P. Ó Murchú 🇵🇸 (@murchadhmor) January 6, 2021

Study: Folklore structure reveals how conspiracy theories emerge, fall apart

Word-level text generation with Keras in <50 lines of code

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Continuous Active Learning Using Pretrained Transformers

Cainteoirí Dúchais a éisteacht

stanfordnlp/string2string — String-to-String Algorithms for Natural Language Processing

REVIEW OF 1984 By Isaac Asimov

MLCommons People’s Speech Dataset

Wasmer, code

COBE: Contextualized Object Embeddings from Narrated Instructional Video

Russian Text Normalization for STT and TTS

k-Nearest Neighbor Language Models

PyTorch internals

MiniTorch

thu-spmi/CAT — A CRF-based ASR Toolkit

Linformer: Self-Attention with Linear Complexity

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

galv/galvASR

awslabs/sockeye — Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

Cross-lingual Retrieval for Iterative Self-Supervised Training

How to publish a txt corpora with NIF as Linked Data

RDF Mapping Language

ImageDescriptionRdfExamples

openai/CLIP — CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Recognizing Pose Similarity in Images and Videos

opensheetmusicdisplay/opensheetmusicdisplay — OpenSheetMusicDisplay renders sheet music in MusicXML format in your web browser based on VexFlow. OSMD is brought to you by PhonicScore.com.

katspaugh/wavesurfer.js

Spectrograms and speech processing

Why You Should Do NLP Beyond English

Leabhair do Pháistí

DingXiaoH/RepVGG — RepVGG: Making VGG-style ConvNets Great Again

adamjankaczmarek/poleval2020

kwrobel-nlp/kftt — Polish morphosyntactic tagger.

First Steps in Irish

Foghraidheacht Ghaedhilge an Tuaiscirt

Guide to Irish Pronunciation

An Ghaeilge

AOIDHMÍN MAC GRÉAGÓIR

izuzak/noam — JavaScript library for working with automata and grammars for regular and context-free languages

google/refr — A framework for building reranking models.

usc-sail/barista — Barista is an open-source framework for concurrent speech processing.

Pronouns and Definite vs Indefinite Conjugation

labmlai/annotated_deep_learning_paper_implementations

k-Nearest Neighbor Language Models

Evaluate k-nearest neighbor language model

Weight Standardization

Nucleus Sampling

Denoising Diffusion Probabilistic Models (DDPM)

A Course in Machine Learning

CS224N: Natural Language Processing with Deep Learning

OpenNLPLab/cosFormer — [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

MarianMT Know, Train & Infer

web-arena-x/webarena — Code repo for “WebArena: A Realistic Web Environment for Building Autonomous Agents”

POSSESSIVE AFFIXES

nltk.tag.brill module

LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition, code

Guidance: a cheat code for diffusion models

Diffusion language models

Perspectives on diffusion

Anki decks Hungarian

How to adapt a multilingual T5 model for a single language

salesforce/LAVIS — LAVIS - A One-stop Library for Language-Vision Intelligence

kscanne/gbb — Sonraí traenála/tástála NLP

Transformer Taxonomy

ML Olympiad - Multilingual Spell Correction

Fine-tuning the multilingual T5 model from Huggingface with Keras

How to adapt a multilingual T5 model for a single language

Irish Language Sayings

Zjh-819/LLMDataHub — A quick guide (especially) for trending instruction finetuning datasets

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer, code

T5 Fine Tuning Tutorial

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

Hannibal046/Awesome-LLM — Awesome-LLM: a curated list of Large Language Model

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

cvg/LightGlue — LightGlue: Local Feature Matching at Light Speed (ICCV 2023)

kenjihiranabe/The-Art-of-Linear-Algebra — Graphic notes on Gilbert Strang’s “Linear Algebra for Everyone”

Forced Alignment with Wav2Vec2

click

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

UD Swedish Talbanken

google-research/tensor2robot — Distributed machine learning infrastructure for large-scale robotics research

google-research/pix2seq — Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

google-research/language-table — Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.

google-research/jax3d

Doktor Bubó

Unit 3. Transformer architectures for audio

HomeRobot: Open Vocabulary Mobile Manipulation, code

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

google-deepmind/dm_robotics — Libraries, tools and tasks created and used at DeepMind Robotics.

facebookresearch/LaViLa — Code release for “Learning Video Representations from Large Language Models”

facebookresearch/paco — This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts, and visualization notebooks.

Introduction to Haliax

lyuchenyang/Macaw-LLM — Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

YuanGongND/cav-mae — Code and Pretrained Models for ICLR 2023 Paper “Contrastive Audio-Visual Masked Autoencoder”.

Tracking Everything Everywhere All at Once, code

facebookresearch/audiocraft — Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Whar city every country is most ashamed of in europe

mkuchnik/relm — ReLM is a Regular Expression engine for Language Models

A Fast Algorithm for Computing Prefix Probabilities

Implementation of the Branchformer

Transducer Beam Search

HyperMixer: An MLP-based Low Cost Alternative to Transformers

CTC Beam Search

FlexFormer

HyperConformer

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Brainformers: Trading Simplicity for Efficiency

ggerganov/whisper.cpp

unilight/seq2seq-vc — A sequence-to-sequence voice conversion toolkit.

shivangi-aneja/COSMOS — [AAAI 2023] COSMOS: Catching Out-of-Context Misinformation using Self Supervised Learning

DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

There was a 'Not Found' error fetching URL: 'https://twitter.com/i/web/status/1661714548594823174'

Announcing AI2 OLMo, an Open Language Model Made by Scientists, for Scientists

Byte Pair Encoding is Suboptimal for Language Model Pretraining

The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue

MuJoCo, code — Multi-Joint dynamics with Contact. A general purpose physics simulator.

google-research/robopianist — [CoRL ‘23] Dexterous piano playing with deep reinforcement learning.

Perlence/PyGuitarPro — Read, write and manipulate GP3, GP4 and GP5 files.

Towards Healthy AI: Large Language Models Need Therapists Too

https://openuni.ai/, code

psst-challenge/psstbaseline — Baseline models for the Post-Stroke Speech Transcription (PSST) challengt

OIG Dataset

togethercomputer/OpenChatKit

viktor-enzell/wav2vec2-large-voxrex-swedish-4gram

Flamingo: a Visual Language Model for Few-Shot Learning

UL2 20B: An Open Source Unified Language Learner, code

Kaldi ASR: Extending the ASpIRE model

openai/CLIP — CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Multimodal Chain-of-Thought Reasoning in Language Models, code

lllyasviel/ControlNet — Let us control diffusion models!

HotpotQA, huggingface

GPT in 60 Lines of NumPy

FelixOpolka/Single-Player-MCTS — Python implementation of single-player Monte-Carlo Tree Search.

google-deepmind/mctx — Monte Carlo tree search in JAX

Speech Synthesis, Recognition, and More With SpeechT5

Teaching OPT to Paraphrase through Soft Prompt Tuning

Use transfer learning for ASR in ESPnet2

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

v-iashin/SpecVQGAN — Source code for “Taming Visually Guided Sound Generation” (Oral at the BMVC 2021)

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Compiling

OverFlow: Putting flows on top of neural transducers for better TTS

Introducing anywidget

dair-ai/Mathematics-for-ML — A collection of resources to learn mathematics for machine learning

dair-ai/ML-Notebooks — Machine Learning Notebooks

SentenceBERT — Semantically meaningful sentence embeddings the right way

krrish94/nerf-pytorch — A PyTorch re-implementation of Neural Radiance Fields

nv-tlabs/nglod — Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

NVIDIAGameWorks/PhysX — NVIDIA PhysX SDK

NVIDIA-Omniverse/IsaacGymEnvs — Isaac Gym Reinforcement Learning Environments

NVIDIAGameWorks/kaolin — A PyTorch Library for Accelerating 3D Deep Learning Research

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer, code

Denys88/rl_games — RL implementations

sail-sg/envpool — C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

Evidence of a predictive coding hierarchy in the human brain listening to speech

Hungarian pronouns

Hungarian Core 100 Word List

A Distributed Systems Reading List

facebookresearch/LASER

RDF Mapping Language

ImageDescriptionRdfExamples

Lexicon Model for Ontologies: Community Report, 10 May 2016

Leabhair do Pháistí

‘Tá an aisling seo agam’

Dive into Deep Learning

Working with sequences

Wapiti - A simple and fast discriminative sequence labelling toolkit, code

Towards Augmenting Lexical Resources for Slang and African American English

jupyter-xeus/xeus-cling

Notebook to run Ruby on Google Colaboratory

Finding the Words to Say: Hidden State Visualizations for Language Models

MixConv: Mixed Depthwise Convolutional Kernels

Extracting Features from an Intermediate Layer of a Pretrained ResNet Model in PyTorch

Basics of Self-Attention

PatchBERT: Just-in-Time, Out-of-Vocabulary Patching

Fine-tuning Mozilla DeepSpeech for the Indian Accent

Indian Accent Speech Recognition

wilpert/RusPhonetizer

trainc — TrainC builds compact context dependency transducers for WFST-based speech recognition from acoustic training data.

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

A curated list of speech and natural language processing resources

benob/openlat — Toolkit for manipulating word lattices built on top of openfst

usc-sail/barista — Barista is an open-source framework for concurrent speech processing.

amir-zeldes/gum — Repository for the Georgetown University Multilayer Corpus (GUM)

nassosoassos/sail_align — SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors.

Darby O’Gill and the Good People

The Sleeping beauty of the wood

LEAF: A Learnable Frontend for Audio Classification

Spectrogram & Oscillator, code

lucidrains/axial-positional-embedding — Axial Positional Embedding for Pytorch

CPJKU/madmom — Python audio and music signal processing library

matthew-brett/transforms3d — 3 dimensional spatial transformations

Moof-A-Day: Early Macintosh Software

arogozhnikov/einops — Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Introducing Lamini, the LLM Platform for Rapidly Customizing Models

The Illustrated Stable Diffusion

Leaḃar ar áireaṁ

Auraicept na n-éces

Naoi ngábhadh an Ghiolla Dhuibh.

Naoi Ghabh an Ghiolla Dubh

lucidrains/PaLM-rlhf-pytorch — Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

How Virtual Reality Can Help Those With Autism

Giskard is coming to your notebook: Python meets Java via gRPC tunnel

Illustrating Reinforcement Learning from Human Feedback

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

microsoft/BlingFire — A lightning fast Finite State machine and REgular expression manipulation library.

rom1504/cc2dataset — Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text,

facebookresearch/barlowtwins — PyTorch implementation of Barlow Twins.

facebookresearch/vicreg — VICReg official code base

sileod/tasknet — Easy multi-task learning with HuggingFace Datasets and Trainer

Irish English Resource Centre

Reward is not Necessary: How to Create a Modular & Compositional Self-Preserving Agent for Life-Long Learning

The Standardization of Irish

Reviewed Work: Linguistic Atlas and Survey of Irish Dialects. Vol. IV, the Dialects of Ulster and the Isle of Man. Specimens of Scottish Gaelic Dialects. Phonetic Texts of East Ulster Irish by Heinrich Wagner, Colm Ó Baoill

Vol. 16, 1952, Contributions in Memory of Osborn Bergin

Vol. 13, 1942 Ériu

Why libtorch?

orktes/go-torch — LibTorch (PyTorch) bindings for Golang

The Gaelic dialect of Urris, Inishowen, Co. Donegal

karpathy/deep-vector-quantization — VQVAEs, GumbelSoftmaxes and friends

Animating Stereograms with Optical Flow Morphing

Transformers in Pytorch from scratch for NLP Beginners

PyTorch for TensorFlow Users - A Minimal Diff

Brain, Time, CTC blank states and streaming

Testing Facebook MMS and SeamlessMT4 Word Error Rate

N-gram language model toolkits in 2020

jermp/tongrams — A C++ library providing fast language model queries in compressed space.

On latency of speech recognition

Wav2vec 2.0: Learning the structure of speech from raw audio

Generate distance matrix from features

Calamari-OCR/calamari — Line based ATR Engine based on OCRopy

kraken, mittagessen/kraken — OCR engine for all the languages

GT4HistOCR: Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin

not-implemented/hocr-proofreader — Web based JavaScript GUI library for proofreading/editing hOCR

GeReV/hocr-editor-ts — A visual hOCR file editor

What the BookCorpus?

Introduction to Simple Neural Networks

PaperPort

OmniPage

Python Concurrency: The Tricky Bits

hocr-tools, CUSAT/hocr-tools — Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

mbartoli/tAlign — Text alignment for OCR using FFTs

DDMAL/text_alignment — Aligns correct transcripts to text images using a “messy” OCR and Needleman-Wunsch sequence alignment

Early-Modern-OCR/RETAS — Part of eMOP: the Recursive Text Alignment Tool compares OCR text results to groundtruth by character and computes a score.

cisocrgroup/ocrd_cis — OCR-D python tools

ofirpress/shortformer — Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Irish Script on Screen

Gaeilge Laighean by Colm Ó Broin

XLSR.ipynb

TomHarte/dsk2woz — A command-line tool to convert Apple II DSK images to WOZ format.

bzotto/picturedsk — Imprint an “image” in the magnetic flux of an Apple 5.25” floppy disk

Learn About Transformers: A Recipe

ILF… audio recordings

Beaṫa Aoḋa Ruaiḋ Ui Doṁnaill

Fiche bliadhain ag fás

Training an Ocropus OCR model

Finding blocks of text in an image using Python, OpenCV and numpy

ocropus-archive/DUP-ocropy — Python-based tools for document analysis and OCR

Joint ASR and language identification using RNN-T: An efficent approach to dynamic language switching

NVIDIA/speechsquad — Conversational AI Benchmark.

Urlabhraidheacht agus graimear na gaedhilge, cuid I.

Illustrating the Reformer

digiah/oldOCR — Optical Character Recognition of old and noisy print sources

Neural Inverse Text Normalization

JFLAP — JFLAP is software for experimenting with formal languages topics including nondeterministic finite automata, nondeterministic pushdown automata, multi-tape Turing machines, several types of grammars, parsing, and L-systems.

tokenwiser

Chris Lattner: Revolutionizing the C++ World

Irish folklore archive inscribed into UNESCO register

GraphiteEditor/Graphite — 2D raster & vector editor that melds traditional layers & tools with a modern node-based, fully non-destructive procedural workflow.

apple/turicreate — Turi Create simplifies the development of custom machine learning models.

Comparing signals in the time domain

google-research/sofima — Scalable Optical Flow-based Image Montaging and Alignment

google-research/noise2music

google-research/lingvo-lab — Demos, samples, and experimental code for Lingvo.

google-research/last — A JAX library for building lattice-based speech transducer models

ZipIt! Merging Models from Different Tasks without Training

hyunwoongko/kochat — Opensource Korean chatbot framework

Asteroid getting started

Irish UPSID 342

Lyra: A New Very Low-Bitrate Codec for Speech Compression

neulab/nn4nlp-concepts — A repository of concepts related to neural networks for NLP

neubig/nn4nlp-code — Code Samples from Neural Networks for NLP

Modern Irish grammar

seungwonpark/melgan — MelGAN vocoder (compatible with NVIDIA/tacotron2)

‘Déanaim iarracht ‘rothar’ a rá in áit ‘badhsacal’ – tuairim chonspóideach do Chois Fhairrgeach…’

Interface Between Phonology and Phonetics

Unsupervised Question Answering

Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples

NLTK Sample usage for parse

allenai/allennlp — An open-source NLP research library, built on PyTorch.

allenai/allennlp-semparse — A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP

DaCy: New Fast and Efficient State-of-the-Art in Danish NLP!

triantac/punkbuddy

Nuclear accents in four Irish (Gaelic) dialects

Development of an automatic attitude recognition system: a multimodal analysis of video blogs

The phonetics and phonology of the intonation of Irish dialects

Controlling the voice quality dimension of prosody in synthetic speech using an acoustic glottal model

MASSIVE translations

google-research/nisaba — Finite-state script normalization and processing utilities

OSM: Townlands

Using alignments from Montreal Forced Aligner to train

alberto-poncelas/tesseract_postprocess

Tonal alignment in three varieties of Hiberno-English

Modelling intonation in three Irish dialects

Peak timing in two dialects of Connaught Irish

Dialect alignment signatures

A Linguistically Motivated Computational Framework for Irish Sign Language

Maidir le Croidhe Cainnte Chiarraighe

SingFónaic

Helsinki-NLP/Tatoeba-Challenge

Fisher Information Matrix

zhao-shuyang/childrenize — Signal processing method to convert adult speech into child-like

Learnable latent embeddings for joint behavioural and neural analysis

Teic na nGael

ABAIR-CabairE

in progress list for Project Gutenberg

Facebook & Google’s LazyTensor Enables Expressive Domain-Specific Compilers

The Dialects of Co. Clare, Part 1

facebookresearch/vissl — VISSL is FAIR’s library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

google-research/simclr — SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Leveraging the Exact Likelihood of Deep Latent Variable Models

Transformers Explained Visually part 2

salesforce/WikiSQL — A large annotated semantic parsing corpus for developing natural language interfaces.

bentrevett/pytorch-seq2seq — Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.

Essential sources for Irish dialect study II: Doegen

The Irish Language in Rathlin Island

Ráidhteachas an Fheadha

salesforce/apollo — An experimental multi-tenant distributed system platform

salesforce/TransmogrifAI — TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

salesforce/decaNLP — The Natural Language Decathlon: A Multitask Challenge for NLP

salesforce/TabularSemanticParsing — Translating natural language questions to a structured query language

salesforce/ai-economist — Foundation is a flexible, modular, and composable framework to model socio-economic behaviors and dynamics with both agents and governments. This framework can be used in conjunction with reinforcement learning to learn optimal economic policies, as done by the AI Economist (https://www.einstein.ai/the-ai-economist).

Books in Seanchló / Cló Gaelach

Cnuasach Focal as Oirialla

aistear — Suíomh áiseanna d’aistritheoirí, d’eagarthóirí agus do gach duine a bhíonn ag scríobh i nGaeilge.

Ba mhaith liom ‘a thuiscint’ cén fáth a bhfuil an ghramadach chomh deacair sin

D2Go brings Detectron2 to mobile, facebookresearch/d2go — D2Go is a toolkit for efficient deep learning

Ropucha: fadedpage, Wikiźródła

Spanish

Apes, psychos, alcos: How British cartoonists depict the Irish

Keating’s general history of Ireland

Irish Language, 1700-1999 — Selection of books and manuscripts written in Irish.

Irish prose and poetry

Dánta aṁráin, is caointe Ṡeaṫrúin Céitinn

Imtheachta Æniasa.

Cuchulain of Muirthemne sacred texts

Parliamentary Papers, Proceedings and Departmental Papers : UK: Ireland

Calendar of documents, relating to Ireland, preserved in Her Majesty’s Public Record Office, London, 1171-1307

Bealoideasbeo.com

syegulalp/Akilang — A compiler for a simple language, built with Python and LLVM

lark-parser/lark — Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

numba/llvmlite — A lightweight LLVM python binding for writing JIT compilers

libcpu/libcpu — “libcpu” is an open source library that emulates several CPU architectures

apple/ml-qrecc — Open-Domain Question Answering Goes Conversational via Question Rewriting

bbc/bbcrd-brirs — An impulse response dataset for dynamic data-based auralisation of advanced sound systems

sofacoustics/SOFAtoolbox — SOFA Toolbox (API for Matlab, Octave)

aligner 0.1.6 — Automatically corrects subtitle timings given a second correct subtitle, github

Cochleagram Representation of Sound

SpeechColab/GigaSpeech — Large, modern dataset for speech recognition

iver56/audiomentations — A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

SpeechBrain Tutorials

The Learning Rate Finder Technique: How Reliable Is It?

Faster than training from scratch — Fine-tuning the English GPT-2 in any language with Hugging Face and fastai v2

Fastai with 🤗 Transformers

Unsupervised pretraining transfers well across languages

Podchraoltaí Gaeilge

A full statement of the trial and acquittal of Aaron Burr, esq

The Irish landed gentry when Cromwell came to Ireland

Litriú na Gaeilge

Jimín Ṁáire Ṫaiḋg

An foclóir beag

Jócleabhar beag bídeach na Gaeilge

Ceannóga agus Coinlíní

Fuaimeanna na Gaeilge

teddykoker/torchsort

Countering the claims about Australia’s Aboriginal number systems

Gryf : pismo dla spraw kaszubskich

AIdeaLab/wav2vec2_docker — pretraining wav2vec docker for sagemaker.

Compressing Wav2vec 2.0

cpierse/wav2vec2-large-xlsr-53-irish

eval.py

ashubham/CPT — Compact prediction trees for fast sequence prediction using Machine Learning

Residual Energy-Based Models for End-to-End Speech Recognition

julien-c/DPRNNTasNet-ks16_WHAM_sepclean

Fine-tuning a model on a translation task

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Fine-tune a pretrained model

OCR with Keras, TensorFlow, and Deep Learning

snapthat/TF-T5-text-to-text — This repository demonstrate training T5 transformers using tensorflow 2

PiotrDabkowski/Js2Py — JavaScript to Python Translator & JavaScript interpreter written in 100% pure Python

Wav2Vec2-T5 v2.ipynb

Core concepts in k2

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition, code

Distilling Zero Shot Classification.ipynb

amzn/xfer — Transfer Learning library for Deep Neural Networks.

asteroid-team/asteroid — The PyTorch-based audio source separation toolkit for researchers

astanin/python-tabulate — Pretty-print tabular data in Python, a library and a command-line utility. Repository migrated from bitbucket.org/astanin/python-tabulate.

Trankit, nlp-uoregon/trankit — Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

coqui-ai/STT — STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Double Decoder Consistency

kan-bayashi/ParallelWaveGAN

NVIDIA/mellotron — Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

MycroftAI/mimic2 — Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.

MycroftAI/lingua-franca — Mycroft’s multilingual text parsing and formatting library

MycroftAI/skill-date-time — Mycroft AI official Date and Time Skill, providing the current time, date and day of week for cities around the world.

Jaco-Assistant/Scribosermo Train fast Speech-to-Text networks in different languages

grammarly/ua-gec — UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

grammarly/gector — Official implementation of the papers “GECToR – Grammatical Error Correction: Tag, Not Rewrite” (BEA-20) and “Text Simplification by Tagging” (BEA-21)

Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Digital-Umuganda/Deepspeech-Kinyarwanda — The kinyarwanda model for deepspeech

End-to-End Speaker-Attributed ASR with Transformer

Differentiable Weighted Finite-State Transducers

facebookresearch/mmf — A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

kaegi/alass — “Automatic Language-Agnostic Subtitle Synchronization”

pums974/srtsync — Automatic synchronizer of subtitles based on voice activity in the video

oseiskar/autosubsync — Automatically synchronize subtitles with audio using machine learning

tympanix/subsync — Synchronize your subtitles using machine learning

CCExtractor/Subtitle-Resync — A tool to automatically generate in-sync subtitles of different versions of the same base media (such as with edits)

sc0ty/subsync — Subtitle Speech Synchronizer

GEM Benchmark Tasks

Getting Started With Embeddings

This past week I spent some time learning about SentenceTransformers (https://t.co/5ZAV7lJq7u), and I'm pretty blown away by what sentence embeddings can be used for.

If you're curious to see what researchers have been getting up to with it, here's a 🧵 with some highlights:
— Nima Boscarino (@NimaBoscarino) June 10, 2022

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

High-Quality, Robust and Responsible Direct Speech-to-Speech Translation

Introducing CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

milvus-io/milvus — A cloud-native vector database, storage for next generation AI applications

Siri can’t speak Irish: Tackling the digital gaps for the Irish language

lucidrains/reformer-pytorch — Reformer, the efficient Transformer, in Pytorch

BBC reporter Phil McCann triggers ‘fill my can’ memes as he covers fuel shortage in UK

NodLabs/mlir-examples — a simple end to end example of taking a ML graph (TF2 / PyTorch) and running it on a device [cpu, gpu]

Machine Learning Simplified: A gentle introduction to supervised learning

XTREME-S benchmark examples

De vandrande djäknarne, De vandrande djäknarne / 3

Beyond Graph Neural Networks with PyNeuraLogic

AMI Corpus Overview

TorchStudio/torchstudio — IDE for PyTorch and its ecosystem

TorchStudio

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, code

clam004/intro_continual_learning — This is a tutorial to connect the fundamental mathematics to a practical implementation addressing the continual learning problem of artificial intelligence

gustavo-beck/wavebender-gan

Accent-VITS:accent transfer for end-to-end TTS

Why Are Rich People So Mean?

Structured Log Linear Models for Noise Robust Speech Recognition

Let the Script Find Out the ML Model that Outperforms Yours

Neural music instrument cloning from very few samples

microsoft/Swin-Transformer — This is an official implementation for “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”.

facebookresearch/xcit — Official code Cross-Covariance Image Transformer (XCiT)

Patches Are All You Need?, locuslab/convmixer — Implementation of ConvMixer for “Patches Are All You Need?”

Direct multimodal few-shot learning of speech and images

yoav-lavi/melody — Melody is a language that compiles to regular expressions and aims to be more readable and maintainable

Boosting Wav2Vec2 with n-grams in 🤗 Transformers

En Nyckfull kvinna del 1

Weakly Supervised Construction of ASR Systems with Massive Video Data

Data Augmentation library for text

Wav2vec could be more efficient, so we created our own pre-trained ASR Model for better Conversational AI.

SEW-D

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

JAX Vs TensorFlow Vs PyTorch: A Comparative Analysis

FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR

google-deepmind/dm-haiku — JAX-based neural network library

Getting started with JAX

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

Joint Speech Recognition and Audio Captioning, code

Turning a Google Colab Notebook into a Web App

Training Acoustic Models

Forced Alignment

Common Mistakes in Hyper-Parameters Tuning

A Large-Scale Study on Regularization and Normalization in GANs

Psychophysical and behavioral peripheral and central auditory tests

shivammehta007/Neural-HMM

Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

One model for the learning of language

unfs3/unfs3 — UNFS3 is a user-space implementation of the NFSv3 server specification.

Sinkformers: Transformers with Doubly Stochastic Attention

descriptinc/lyrebird-wav2clip — Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Understanding Q,K,V In Transformer

Billion-scale vector search with Vespa - part one

Scaling Vision with Sparse Mixture of Experts, google-research/vmoe

Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis

Logainmneacha Mhagh Loirg agus Uachtar Thíre, Contae Ros Comáin: Anailís ar ainmneacha bhailte fearainn na seandúichí sin

Neural edit-tree lemmatization for spaCy

KristiyanVachev/Leaf-Question-Generation — Easy to use and understand multiple-choice question generation algorithm using T5 Transformers.

On visualizing phonetic data from repeated measures experiments with multiple random effects

Sohcahtoa: Sine, Cosine, Tangent

Áiseanna bunscoile ar líne

Strange and forgotten consoles

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

UniversalDependencies/UD_Irish-IDT

bbc/peaks.js — JavaScript UI component for interacting with audio waveforms

bbc/waveform-data.js — Audio Waveform Data Manipulation API – resample, offset and segment waveform data in JavaScript.

frictionlessdata/frictionless-py — Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data

smacke/ffsubsync — Automagically synchronize subtitles with video.

bbc/morty-docs — Generate a static website from markdown files

BorisChumichev/everpolate — Numerical interpolation and extrapolation lib

19 entities for 104 languages: A new era of NER with the DeepPavlov multilingual BERT

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

LEAF: A Learnable Frontend for Audio Classification

“Chain-linking” NLP tasks With Wav2Vec2 & Transformers

guanlongzhao/kaldi-gop — Computes the Goodness of Pronunciation (GOP). Bases on Kaldi.

L2-ARCTIC

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

Speech and Language Processing

AdapterHub: A Framework for Adapting Transformers

Neighbours across the sea: A brief history of Anglo-Irish relations

make_lexicon_fst.py

Named Entity Recognition

NN-SVG — Publication-ready NN-architecture schematics

HarisIqbal88/PlotNeuralNet — Latex code for making neural networks diagrams

lutzroeder/netron — Visualizer for neural network, deep learning and machine learning models

pettarin/forced-alignment-tools — A collection of links and notes on forced alignment tools

chrisbaume/overtyper — Experiment in automatic insertion of timed transcript corrections using fuzzy phonetic matching

bbc/dialogger — Text-based media editing interface

chrisbaume/webaligner — A client-side forced aligner for speech

bbc/stt-align-node — node version of stt-align https://github.com/bbc/stt-align by Chris Baume - R&D.

bbc/react-transcript-editor — A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress

bbc/vc2hqencode — Optimised VC-2 HQ Profile Encoder Library

bbc/vc2_conformance — Software tools for checking the conformance of SMPTE ST 2042-1 (VC-2) professional video codec implementations.

bbc/vc2-reference — A reference encoder and decoder for SMPTE ST 2042-1 “VC-2 Video Compression”

bbc/dash.js — A reference client implementation for the playback of MPEG DASH via Javascript and compliant browsers.

bbc/storyplayer — BBC Research & Development’s Object Based Media Player

bbc/digital-paper-edit-api — Work in progress - BBC News Labs digital paper edit project - Express server API

mozilla/DSAlign — DeepSpeech based forced alignment tool

Trials and Tribulations: Using Keras on Colab and TPU

Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training

Some Kaldi Notes

TensorSpeech/TensorFlowASR

Open Science in phonetics and phonology

hf_wav2vec2_deepspeed.ipynb

Part 2 - Extracting Audio Features

Yoda Speech Corpus

youtube8m

CNNDigitReco-speakerindependent

Spanish Automatic Speech Recognition pytorch

Mr Donald Trump Speeches

Joe Biden 2020 DNC Speech

Grid Search to find best tuning parameters

asahi417/tner — Language model fine-tuning on NER with an easy interface and cross-domain evaluation. “T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021”

Train BERT from scratch

Classification on FSDD using Spectograms

JaidedAI/EasyOCR — Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Polish Christmas Carols

An Saol ó Dheas

Nuacht1.com

FÍSEÁN: ‘Níl na comharthaí iontach maith’ – amhras léirithe ag stiúrthóir an Oireachtais faoi fhéile na bliana seo

make_kn_lm.py

A Speech-To-Text Practitioner’s Criticisms of Industry and Academia

eric-mitchell/direct-preference-optimization — Reference implementation for DPO (Direct Preference Optimization)

tiangolo/fastapi — FastAPI framework, high performance, easy to learn, fast to code, ready for production

explosion/wasabi — A lightweight console printing and formatting toolkit

lucidrains/DALLE-pytorch — Implementation / replication of DALL-E, OpenAI’s Text to Image Transformer, in Pytorch

facebookresearch/pytorchvideo — A deep learning library for video understanding research.

microg/GmsCore — Free implementation of Play Services

mortennobel/cpp-cheatsheet — Modern C++ Cheatsheet

micknoise/Maximilian — C++ Audio and Music DSP Library

taywee/args — A simple header-only C++ argument parser library. Supposed to be flexible and powerful, and attempts to be compatible with the functionality of the Python standard argparse library (though not necessarily the API).

antirez/linenoise — A small self-contained alternative to readline and libedit

p-ranav/tabulate — Table Maker for Modern C++

photonstorm/phaser — Phaser is a fun, free and fast 2D game framework for making HTML5 games for desktop and mobile web browsers, supporting Canvas and WebGL rendering.

Suyash458/WiktionaryParser — A Python Wiktionary Parser

llvmpy/llvmpy

ucb-bar/riscv-sodor — educational microarchitectures for risc-v isa

boriel/zxbasic — The Sinclair ZX Spectrum BASIC compiler!

erikrose/blessings — A thin, practical wrapper around terminal capabilities in Python

tartley/colorama — Simple cross-platform colored terminal text in Python

jzhang38/TinyLlama — The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

PrefectHQ/marvin — Build AI interfaces that spark joy

alibaba/animate-anything — Fine-Grained Open Domain Image Animation with Motion Guidance

Kanaries/pygwalker — PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

commaai/openpilot — openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.

dvmazur/mixtral-offloading — Run Mixtral-8x7B models in Colab or consumer desktops

Textualize/rich — Rich is a Python library for rich text and beautiful formatting in the terminal.

DLYuanGod/TinyGPT-V — TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

PaddlePaddle/PaddleGAN — PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.

koaning/drawdata — Draw datasets from within Jupyter.

jindrapetrik/jpexs-decompiler — JPEXS Free Flash Decompiler

SerenityOS/serenity

topjohnwu/Magisk — The Magic Mask for Android

gzc/CLRS — olutions to Introduction to Algorithms

libcpr/cpr — C++ Requests: Curl for People, a spiritual port of Python Requests.

fffaraz/awesome-cpp — A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-… stuff.

martinmoene/ring-span-lite — ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library

davisking/dlib

autodiff/autodiff — automatic differentiation made easier for C++

linebender/druid — A data-first Rust-native UI design toolkit.

linebender/runebender — A font editor written in Rust.

pemistahl/grex — A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases

emilk/egui — egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native

actix/actix-web — Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.

RDFLib/rdflib

chipsalliance/chisel — Chisel: A Modern Hardware Design Language

ucb-bar/dsptools — A Library of Chisel3 Tools for Digital Signal Processing

When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models, code

Streamlit vs. Dash vs. Shiny vs. Voila vs. Flask vs. Jupyter

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Mallard BASIC: Introduction and Reference

Joyce Computer Club Public Domain - BASIC

Mallard BASIC to CPC BASIC

m-wiesner/nnet_pytorch — Kaldi style neural network training in pytorch for use in place of nnet3 in Kaldi.

PCWsBAS.WS4

32-bit Apps in a 64-bit Docker Container

marytts/gradle-marytts-voicebuilding-plugin

pyparsing/pyparsing — Python library for creating PEG parsers

IS2AI/Kazakh_TTS — An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In KazakhTTS2, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified.

coady/lupyne — Pythonic search engine based on PyLucene.

apache/pdfbox

Nine Polish books you must read before you die

pytorch/audio

ml-tooling/opyrator — Turns your machine learning code into microservices with web API, interactive GUI, and more.

tomstitt/lupyter — A Lua Kernel for Jupyter built on ipykernel.

Automated Guitar Transcription with Deep Learning

GuitarsAI/ADSP_Tutorials — Advanced Signal Processing Notebooks and Tutorials

GuitarML/GuitarLSTM — Deep learning models for guitar amp/pedal emulation using LSTM with Keras.

GuitarML/SmartAmpPro — Guitar plugin using neural networks to capture real amps and pedals

voila-dashboards/voila — Voilà turns Jupyter notebooks into standalone web applications

jupyter-xeus/xeus-cling — Jupyter kernel for the C++ programming language

PyO3/pyo3 — Rust bindings for the Python interpreter

Interactive Rust in a REPL and Jupyter Notebook with EVCXR

AK391/spleeter — Deezer source separation library including pretrained models.

CoderLine/alphaTab — alphaTab is a cross platform music notation and guitar tablature rendering library.

dpilger26/NumCpp — C++ implementation of the Python Numpy library

faridrashidi/kaggle-solutions — Collection of Kaggle Solutions and Ideas

Introduction to Sound Event Detection

Radio Kaszëbë

Podòdzél jistników na deklinacje. I deklinacjô

Najô Ùczba

Open-Speech-EkStep/vakyansh-wav2vec2-experimentation

Machine Learning - Google for Developers

10 Jupyter Notebook Extensions Making My Lyfe Easier

org-arl/jupyter-ieee-paper — Jupyter notebook to generate fully formatted IEEE papers

jupyterlab/jupyterlab-latex — JupyterLab extension for live editing of LaTeX documents

Fission, fission/fission — Fast and Simple Serverless Functions for Kubernetes

Typography.js

jik876/hifi-gan — HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

andabi/deep-voice-conversion — Deep neural networks for voice conversion (voice style transfer) in Tensorflow

chriskiehl/Gooey — Turn (almost) any Python command line program into a full GUI application with one line

googlecreativelab/quickdraw-dataset — Documentation on how to access and use the Quick, Draw! Dataset.

ARBML/klaam — Arabic speech recognition, classification and text-to-speech.

Semi-supervised Learning and Frame Rate

Using nbconvert as a library

amperser/proselint — A linter for prose.

openstack/swift — OpenStack Storage (Swift). Mirror of code maintained at opendev.org.

Setup docker for Kaggle

Semi-Supervised Training of Deep Neural Networks for Speech Recognition

Zero-Resource Neural Machine Translation with Monolingual Pivot Data

google/python-fire — Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

ceph/ceph — Ceph is a distributed object, block, and file storage platform

rook/rook — Storage Orchestration for Kubernetes

TigerBot: An Open Multilingual Multitask LLM

RLIF: Interactive Imitation Learning as Reinforcement Learning

Training tiny specialized language models

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

lucidrains/MEGABYTE-pytorch — Implementation of MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

1SPU: 1-step Speech Processing Unit

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, code

sscardapane/reprodl2021 — Host repository for the “Reproducible Deep Learning” PhD course

Aligning Ground Truth Text with OCR Degraded Text

Are you GPU poor?

openaudible/openaudible — Audiobook Manager for Audible Users

A complete guide to transfer learning from English to other Languages using Sentence Embeddings BERT Models

Ki6an/fastT5 — boost inference speed of T5 models by 5x & reduce the model size by 3x.

FieldDB/Praat-Scripts

FieldDB/AndroidLanguageLessons

Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms

Bootstrap your own latent: A new approach to self-supervised Learning

SBert quickstart

jmccrae/irish_saffron — Code related to adapting Saffron to Irish

A new open data set for multilingual speech research, OpenSLR

tugstugi/dl-colab-notebooks — Try out deep learning models online on Google Colab

FELIX: Flexible Text Editing Through Tagging and Insertion, code

openmainframeproject/cobol-programming-course — Training materials and labs for a “Getting Started” level course on COBOL

IBM/cobol-is-fun

Martinfx/Cobol

RiveScript

mohaEs/Train-Predict-Landmarks-by-dlib

ELITR/automin-2021

Involution: Inverting the Inherence of Convolution for Visual Recognition, code, involution_pytorch

open-mmlab/mmocr — OpenMMLab Text Detection, Recognition and Understanding Toolbox

Nuacht Mhall

jakevdp/PythonDataScienceHandbook — Python Data Science Handbook: full text in Jupyter Notebooks

An ultrasound study of Connemara Irish palatalization and velarization

An Ultrasound Investigation of Irish Palatalization

wmcnally/evopose2d — EvoPose2D is a two-stage human pose estimation model that was designed using neuroevolution. It achieves state-of-the-art accuracy on COCO.

cdpierse/transformers-interpret — Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

A dictionary of the Manks language

Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences, code

ikekonglp/PAD — The PAD parser produces phrases-after-dependencies. Give it the output of a dependency parser and it will produce the optimal constrained phrase-structure parse.

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

Neural HMMs are all you need (for high-quality attention-free TTS)

Measuring Massive Multitask Language Understanding, code

qdrant/qdrant — Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Learning rule-based morpho-phonology, code

AakashKumarNain/annotated_research_papers

TextOCR

argilla-io/argilla — Argilla: the open-source feedback platform for LLMs

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

Feasta - Bealtaine 2013

CCExtractor/ccextractor — CCExtractor is a tool used to produce subtitles for TV recordings from almost anywhere in the world. We intend to keep up with all sources and formats.

JabRef/jabref — Graphical Java application for managing BibTeX and biblatex (.bib) databases

rizinorg/rizin — UNIX-like reverse engineering framework and command-line toolset.

Synfig, code — Synfig Studio is a free and open-source 2D animation software, designed as powerful industrial-strength solution for creating film-quality animation using a vector and bitmap artwork

hamelsmu/Seq2Seq_Tutorial — Code For Medium Article “How To Create Data Products That Are Magical Using Sequence-to-Sequence Models”

hamelsmu/Docker_Tutorial — Code and helper scripts for article on Medium “How Docker Can Help You Become A More Effective Data Scientist”

How to write academic papers in Markdown

Writing academic papers in plain text with Markdown and Jupyter notebook

RasaHQ/paraphraser — Tool to generate paraphrases of sentences in many languages.

Diffusion Models Beat GANs on Image Synthesis

JoFrhwld/FAVE — A repository for maintaing the fave-align and fave-extract toolkits

Klatt

creating a vowel diagram

Vowels

vowel – Draw vowel charts for phonetic research

The vowel space

Cad a dhéanfaidh mé le mo fhleiscín-se? Comhairle ghramadaí…

From Notebook to Kubeflow Pipelines with MiniKF and Kale

pachyderm/pachyderm — Data-Centric Pipelines and Data Versioning

GeoPandas, code

KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora

sberdevices/golos

Neargye/magic_enum — Static reflection for enums (to string, from string, iteration) for modern C++, work with any enum type without any macro or boilerplate code

FNet: Mixing Tokens with Fourier Transforms, tensorflow, pytorch

Distributed Training of a Bengali ALBERT model

jgraph/drawio — draw.io is a JavaScript, client-side editor for general diagramming.

Barlow-Twins-TF

evolus/pencil — The Pencil Project’s unique mission is to build a free and opensource tool for making diagrams and GUI prototyping that everyone can use.

linkedin/greykite — A flexible, intuitive and fast forecasting library

staltz/matrixmultiplication.xyz — An interactive matrix multiplication calculator for educational purposes

A Journey Through Fastbook

trekhleb/homemade-machine-learning — Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained

launchbadge/sqlx — The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, SQLite, and MSSQL.

gfx-rs/wgpu — Cross-platform, safe, pure-rust graphics api.

seanmonstar/warp — A super-easy, composable, web server framework for warp speeds.

Nukesor/pueue — Manage your shell commands.

pytorch/captum — Model interpretability and understanding for PyTorch

Emotion Recognition in Greek Speech Using Wav2Vec 2.0

synesthesiam/mycroft-precise-trainer — Text to speech wake word training scripts for Mycroft Precise

rhasspy/rhasspy-asr — Shared Python classes for speech to text

synesthesiam/voice2json — Command-line tools for speech and intent recognition on Linux

LARP: Language-Agent Role Play for Open-World Games

Continvvm/continuum — A clean and simple data loading library for Continual Learning

CC-100: Monolingual Datasets from Web Crawl Data

deepset-ai/haystack — LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots

janusgraph/janusgraph — JanusGraph: an open-source, distributed graph database

Streamlit Tutorial: A Beginner’s Guide to Building Machine Learning-Based Web Applications in Python

textext/textext — Re-editable LaTeX/ typst graphics for Inkscape

Searching, fast and slow, through product catalogs

How to Easily Draw Neural Network Architecture Diagrams

mlflow/mlflow — Open source platform for the machine learning lifecycle

Why use Docker containers for machine learning development?

Nine Tools I Wish I Mastered before My PhD in Machine Learning

Entering raw mode

Common Rust Lifetime Misconceptions

rhasspy/gruut — A tokenizer, text cleaner, and phonemizer for many human languages.

rhasspy/ipa2kaldi — Tool for creating Kaldi nnet3 recipes using the International Phonetic Alphabet (IPA)

rhasspy/wiktionary2dict — Tool for extracting IPA pronunciations from Wiktionary XML dump

Babel, code

How to use SVGs in React

nodejs/nan — Native Abstractions for Node.js

Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training?

hubert simple_kmeans

mamedev

The Gumbel trick

Modifying Custom Matmul CUDA Kernels

DeMoriarty/TorchPQ — Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda

Alexander-H-Liu/NPC — Non-Autoregressive Predictive Coding

s3prl/s3prl

facebookresearch/CPC_audio — An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

as-ideas/TransformerTTS — Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

8-Core Training on Colab TPUs

DistilBertTPUTraining.ipynb

T5 on TPU

dfm/extending-jax — Extending JAX with custom C++ and CUDA code

mmi_mbr_graph.py

SE and ASR joint training #3226

google/REAPER

voicesauce/opensauce-python — Voice analysis software (Python port of VoiceSauce)

rish-16/aft-pytorch — Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

yoshitomo-matsubara/torchdistill — A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆20 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.

Declassified Cold War code-breaking manual has lessons for solving ‘impossible’ puzzles

epfml/sent2vec

epfml/Bi-sent2vec — Robust Cross-lingual Embeddings from Parallel Sentences

MERLOT: Multimodal Neural Script Knowledge Models

Inference with Wav2vec 2.0

AnthonyCalandra/modern-cpp-features — A cheatsheet of modern C++ language and library features.

MayaPosch/NymphCast — Audio and video casting system with support for custom applications.

How Fighter Jets Lock On (and How the Targets Know)

‘Operation Legacy’: Britain’s Destruction and Concealment of Colonial Records Worldwide

gopherdata/gophernotes — The Go kernel for Jupyter notebooks and nteract.

Python and Go : Part II - Extending Python With Go

LANDrop, code — Drop any files to any devices on your LAN.

jpalardy/vim-slime — A vim plugin to give you some slime. (Emacs)

jlevy/the-art-of-command-line — Master the command line, in one page

EbookFoundation/free-programming-books

Neural Machine Translation Using Sequence to Sequence Model

Generative Spoken Language Modeling from Raw Audio

Bunfrasaí Ghaeilge Reachlann

Standard Lexical Sets

Online Units

Quechua Collection of Patricia Dreidemie

mvcisback/lstar — Python implementation of lstar automata learning algorithm.

gbossert/pylstar — An implementation of the LSTAR Grammatical Inference Algorithm

Symbolic Automata

lorisdanto/symbolicautomata — Library for symbolic automata and symbolic visibly pushdown automata

awni/transducer — A Fast Sequence Transducer Implementation with PyTorch Bindings

Sequence Transduction with Recurrent Neural Networks

Sequence-to-sequence learning with Transducers

linear_crf.ipynb

tech-srl/RNN_to_PRS_CFG — Implementation of TACAS 2021 paper, “Extrapolating CFGs from RNNs”

ByT5: Towards a token-free future with pre-trained byte-to-byte models, code

facebookresearch/AugLy — A data augmentations library for audio, image, text, and video.

PrithivirajDamodaran/Styleformer — A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Contrastive Semi-supervised Learning for ASR

Contrastive Learning of General-Purpose Audio Representations, code

FRILL: On-Device Speech Representations using TensorFlow-Lite

parlance/ctcdecode — PyTorch CTC Decoder bindings

kensho-technologies/pyctcdecode — A fast and lightweight python-based CTC beam search decoder for speech recognition.

pariajm/awesome-disfluency-detection — A curated list of awesome disfluency detection publications along with the released code and bibliographical information

How to use the pre-trained Librispeech model in Kaldi

yandex-research/DeDLOC — Official code for “Distributed Deep Learning in Open Collaborations” (NeurIPS 2021)

Distributed Deep Learning in Open Collaborations

cross-language-cpp/djinni-generator — Command-line tool that generates gluecode from a djinni-IDL file

Calling Go Functions from Other Languages

Introduction To Golang For Python developers

Go by Example

jwieting/paraphrastic-representations-at-scale

python-trio/trio — Trio – a friendly Python library for async concurrency and I/O

Making Web Crawlers Using Scrapy for Python

scrapy/scrapy

google-research/deeplab2 — DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.

hugapi/hug — Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR

Using Syllables as Acoustic Units for Spontaneous Speech Recognition

akreal/diphones — PocketSphinx diphone alignment

Diphone-based speech recognition using neural networks

Speech recognition method and system using triphones, diphones, and phonemes

Face recognition with OpenCV, Python, and deep learning

Alignments in Kaldi

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

Keras: Few-Shot learning with Reptile, Image similarity estimation using a Siamese Network with a triplet loss, Self-supervised contrastive learning with SimSiam, Automatic Speech Recognition with Transformer, Code examples

tuplex/tuplex — Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

How Kurt Cobain’s Favorite Novel Made Its Way Onto Nirvana’s Final Album

DIOM3DES

aws/graph-notebook — Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.

o3de/o3de — Open 3D Engine (O3DE) is an Apache 2.0-licensed multi-platform 3D engine that enables developers and content creators to build AAA games, cinema-quality 3D worlds, and high-fidelity simulations without any fees or commercial obligations.

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio, SpeechColab/GigaSpeech — Large, modern dataset for speech recognition

LLM Training: RLHF and Its Alternatives

Sean-Chainnt na gCruach, Co. Dhún na nGall

Can Fully Connected Layers be Replaced by Convolutional Layers?

tencent-ailab/pika — a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

athena-team/athena — an open-source implementation of sequence-to-sequence based speech processing engine

Causal Language modeling

Unitnet Speech Demos: Unit Selection TTS strikes back

Automated Audio Captioning

chapter09_part01_image-segmentation.ipynb

Google DeepMind’s new AI tool helped create more than 700 new materials

UI researchers working to make speech-recognition technology more accessible

feast-dev/feast — Feature Store for Machine Learning

eugeneyan/applied-ml — Papers & tech blogs by companies sharing their work on data science & machine learning in production.

ucam-smt/ucam-smt — Cambridge SMT System

Welcome to the Zero to Mastery TensorFlow for Deep Learning Book

Neural Networks and Deep Learning

a2-4am/a2rchery — A multi-purpose tool for manipulating .a2r disk images

Neural Waveshaping Synthesis

microsoft/flow2dts — Flow declarations to TypeScript declarations transpiler

shivammehta25/Matcha-TTS — [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Using C++ and WSL in VS Code

microsoft/terminal — The new Windows Terminal and the original Windows console host, all in the same place!

microsoft/STL

uwol/proleap-vb6-parser — ProLeap ANTLR4-based parser for Visual Basic 6.0

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Multistream TDNN and new Vosk model

Duanaire na Miḋe

tunib-ai/parallelformers — Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

CPrAN — The plugin manager for Praat

‘Our sound man had Kurt Cobain against the wall’: iconic Leeds gig pub ‘reopens’

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset

viatsko/awesome-vscode

Deep Learning over the Internet: Training Language Models Collaboratively

liuliu/ccv — C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Official Secrets Act reform could target journalists exposing state failings in Troubles’ killings

SUPERB: Speech processing Universal PERformance Benchmark

CS224S, Assignment 3: Deep Learning for End-to-End Speech Recognition

Scary Phonetics? Learning Cardinal Vowels, Part 1

PhonEd: Phonetics Education

How to combine multiple criterions to a loss function? - PyTorch Forums

An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning

Biblia Tysiąclecia

yl4579/StarGANv2-VC — StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Active learning in speech recognition

facebookresearch/fairo — A modular embodied agent architecture and platform for building embodied agents

apache/tvm — Open deep learning compiler stack for cpu, gpu and specialized accelerators

hora-search/hora — efficient approximate nearest neighbor search algorithm collections library written in Rust

An Introduction to Weighted Automata in Machine Learning

facebookresearch/SlowFast — PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

sarulab-speech/jtubespeech — JTubeSpeech: Corpus of Japanese speech collected from YouTube

anishathalye/neural-hash-collider — Preimage attack against NeuralHash 💣

AsuharietYgvar/AppleNeuralHash2ONNX — Convert Apple NeuralHash model for CSAM Detection to ONNX.

KhaosT/nhcalc — Compute NeuralHash for the given image

The Formula For An Episode Of Murder, She Wrote

artyom-beilis/dlprimitives — Deep Learning Primitives and Mini-Framework for OpenCL

labmlai/annotated_deep_learning_paper_implementations

Enhancing audio quality for expressive Neural Text-to-Speech

One TTS Alignment to Rule Them All

turtle — Turtle graphics

respeecher/librispeech-cutter — Scripts for generating librispeech cuts from the original mp3 archive without 16kHz restrictions

stanford-crfm/mistral — Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Language model training examples

A flexible, open-source platform for democratised access to digital resources

Tencent/TNN — deep learning inference framework for mobile、desktop and server.

Zero Redundancy Optimizer

rawpython/remi — Python REMote Interface library. Platform independent. In about 100 Kbytes, perfect for your diet.

microsoft/Focal-Transformer — [NeurIPS 2021 Spotlight] Official code for “Focal Self-attention for Local-Global Interactions in Vision Transformers”

microsoft/Swin-Transformer — This is an official implementation for “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”.

Cartlann na gCanúintí

ahrm/sioyek — Sioyek is a PDF viewer with a focus on textbooks and research papers

How lies about Irish ‘barbarism’ in 1641 paved way for Cromwell’s atrocities

microsoft/UniSpeech — UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Generative Spoken Language Modeling from Raw Audio

NVIDIA/radtts — Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.

AI Choreographer. Music Conditioned 3D Dance Generation with AIST++, paper, dataset, api, model

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

Appen/UHV-OTS-Speech — A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

EgorLakomkin/KTSpeechCrawler — Automatically constructing corpus for automatic speech recognition from YouTube videos

gong-io/gecko — Gecko - A Tool for Effective Annotation of Human Conversations

A Recipe For Arbitrary Text Style Transfer with Large Language Models

DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

All Top Python Libraries for Data Science Explained

I trained Noam Chomsky TTS

FedJAX: Federated Learning Simulation with JAX

Appen/UHV-OTS-Speech — A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

striveiccv2021/STRIVE-ICCV2021/ — STRIVE: Scene Text Replacement In Videos

doxas/twigl — twigl.app is an online editor for One tweet shader, with gif generator and sound shader, and broadcast live coding.

giannisdaras/multilingual_robustness — [NeurIPS 2022] Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

The Annotated Transformer

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

s3prl/LibriMix — An open source dataset for source separation

awslabs/speech-representations — Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

The Turing Way

RLIF: Interactive Imitation Learning as Reinforcement Learning, paper, code

snorkel-team/snorkel — A system for quickly generating training data with weak supervision

Visualizing and Understanding Convolutional Networks

kedro-org/kedro — Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

How to Train Bert For Q&A in Any Language

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Gentle Dive into Math Behind Convolutional Neural Networks

t-ubukata/cudnnxx — cuDNN C++ wrapper.

To RAII or Not to RAII?

Smart developers use smart pointers (1/7) – Smart pointers basics

CTC Variations Through New WFST Topologies

Machine Learning Formulas Explained! 👨‍🏫

This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.

It may look super confusing, but I promise you that it is actually quite simple!

Let's go step by step 👇 pic.twitter.com/LcQofbUJnl
— Vlad Haltakov (@haltakov) October 13, 2021

BaguaSys/bagua — Bagua Speeds up PyTorch

visenger/awesome-mlops — A curated list of references for MLOps

Towards Robust Waveform-Based Acoustic Models

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

NormFormer: Improved Transformer Pretraining with Extra Normalization

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Fine-tuning for Audio Classification with 🤗 Transformers

Control Strategies for Physically Simulated Characters Performing Two-player Competitive Sports

47: Neural Body

Phrase Retrieval and Beyond

Hierarchical Skills for Efficient Exploration, code, paper

facebookresearch/ppuda — Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

hankcs/HanLP — Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

BinderHub, code

jupyterhub/repo2docker

How to write a DSL (in Python with Lark)

rpgleparser/rpgleparser — ANTLR parser for RPGLE

smeup/jariko — a JAva virtual machine Rpg Interpreter written in KOtlin

sallbach/arpgtool — IBM i RPG developer tools (AS/400 iSeries)

worksofliam/5250ttt — Tic-Tac-Toe for 5250 (2 player)

lppedd/RPG — IBM RPG projects

martinezga/ibm-rpg-programs — IBM RPG programs exercises.

How to write a transpiler, code

Strumenta/FormatsDSL — A DSL to describe formats and generate loaders

CLIPScore: A Reference-free Evaluation Metric for Image Captioning, code

This Word Does Not Exist, turtlesoupy/this-word-does-not-exist

Aim 3.0.0 — The foundations for open-source & open-metadata ML platform

Large Language Models: A New Moore’s Law?

babelfish-for-postgresql/babelfish_extensions — Babelfish for PostgreSQL provides the capability for PostgreSQL to work with applications written for Microsoft SQL Server.

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute

uclnlp/torch-imle — Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models

Hierarchical Transformers Are More Efficient Language Models

Language Modelling via Learning to Rank

Teaching robots to perceive, understand, and interact through touch, tacto, PyTouch

Speculative execution for LLMs is an excellent inference-time optimization.

It hinges on the following unintuitive observation: forwarding an LLM on a single input token takes about as much time as forwarding an LLM on K input tokens in a batch (for larger K than you might… https://t.co/FiwTwqsfho
— Andrej Karpathy (@karpathy) August 31, 2023

jzhang38/TinyLlama — The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

TinyLlama/TinyLlama-1.1B-Chat-v0.4

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training, code

Recent Advances in End-to-End Automatic Speech Recognition

bhky/opennsfw2 — Keras implementation of the Yahoo Open-NSFW model

The Rise of Self-Supervised Learning

TF_JAX_Tutorials - Part 9 (Autodiff in JAX)

Irisleabhar na Gaedhilge

Conformer-based Hybrid ASR System for Switchboard Dataset

Hacktoberfest 21’ - Unlocking 40 open-source audio datasets for ML

Full scan of a booklet on Leinster Irish (32 pages): “Dialect in East and Mid-Leinster”, Donn Piatt, 1933.

facebookresearch/demucs — Code for the paper Hybrid Spectrogram and Waveform Source Separation

Scaling ASR Improves Zero and Few Shot Learning

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

InfoXLM

CZWin32768/XLM-Align — Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

LiT: Zero-Shot Transfer with Locked-image text Tuning

Wormhole (1994 Session part 2)

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Joint Unsupervised and Supervised Training for Multilingual ASR

Cousin chart

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

British and American English Pronunciation Differences

Studying the History of English

The sounds of English

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model

thu-spmi/CAT — A CRF-based ASR Toolkit

CLARIN: Spoken Corpora

Historic Language Models

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog, code

Sources for Connemara Irish

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition, asappresearch/sew

Towards Learning Universal Audio Representations

Transformer-S2A: Robust and Efficient Speech-to-Animation

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Hash Layers For Large Sparse Models

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task, code

g2p_encode.py, fairseq_simul_st_agent.py

facebook/wmt21-dense-24-wide-x-en — WMT 21 X-En is a 4.7B multilingual encoder-decoder (seq-to-seq) model trained for one-to-many multilingual translation. It was introduced in this paper and first released in this repository.

facebook/wav2vec2-large-robust-ft-swbd-300h

facebook/s2t-small-mustc-en-nl-st

facebook/wav2vec2-lv-60-espeak-cv-ft

Simultaneous Speech Translation (SimulST) on MuST-C

MuST-C: a Multilingual Speech Translation Corpus

Simplified Grammar of the Hungarian Language

A Green Approach for an Irish App (Refactor, reuse and keeping it real)

HuBERT: How to Apply BERT to Speech, Visually Explained

Automatic Speech Recognition for Supporting Endangered Language Documentation

“Attention is all you need” implementation from scratch in PyTorch. A Twitter thread

googleforgames/open-match-docs

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

mlcommons/peoples-speech — The People’s Speech Dataset

“Demokratifabriken” även som ljudbok?

srush/GPU-Puzzles — Solve puzzles. Learn CUDA.

Conformer: Convolution-augmented Transformer for Speech Recognition

i'll never get over how the cochlea is an analog fourier transform organ pic.twitter.com/VSlu0oqQXH
— murat 🍥 (@mayfer) January 3, 2024

Room impulse response reconstruction with physics-informed deep learning

GPT using Numpy! 🔥

Here is Generative Pretrained Transformer(GPT) implemented from scratch using Numpy in just 60 lines of code: pic.twitter.com/80dTaDkePe
— Clarifai (@clarifai) January 2, 2024

DocLLM: A layout-aware generative language model for multimodal document understanding

godotengine/godot — Godot Engine – Multi-platform 2D and 3D game engine

The 3 Deep Learning Frameworks For End-to-End Speech Recognition That Power Your Devices

Perceiver IO: a scalable, fully-attentional model that works on any modality

Introduction to Facebook AI Similarity Search (Faiss), Facebook AI and the Index Factory

RichiH/vcsh — vcsh - Version Control System for $HOME - multiple Git repositories in $HOME

Few-shot Learning with Multilingual Language Models

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Multi-turn RNN-T for streaming recognition of multi-party speech

1ytic/warp-rna — Recurrent Neural Aligner

theblackcat102/edgedict — Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

1ytic/warp-rnnt — CUDA-Warp RNN-Transducer

awni/automata_ml — An Introduction to Weighted Automata in Machine Learning

awni/speech — A PyTorch Implementation of End-to-End Models for Speech-to-Text

sooftware/kospeech — Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

sooftware/RNN-Transducer — PyTorch implementation of RNN-Transducer(RNN-T).

EdinburghNLP/nematus — Open-Source Neural Machine Translation in Tensorflow

bbc/rd-apmm-python-lib-mediatimestamp — A simple timestamp implementation used by various other libraries

bbc/lrud — Left, Right, Up, Down. A spatial navigation library for devices with input via directional controls.

bbc/grid — BBC’s implementation of The Guardian’s image management system

bbc/digital-paper-edit-client — Work in progress - BBC News Labs digital paper edit project - React Client

bbc/webMUSHRA — a MUSHRA compliant web audio API based experiment software

bbc/programmes-pages-service — A library for accessing ProgrammesDB

bbc/codext — VS Code’s editor shipped as a browser extension.

bbc/clever-thumbnailer — Audio thumbnail generator

bbc/digital-paper-edit-storybook — Work in progress - BBC News Labs digital paper edit project - React storybook

Once a Week (magazine)/Series 1/Volume 11/From Canada to Liverpool, with “skedaddlers” from the Northern army

Rhymes of a Rolling Stone/The Cow-Juice Cure

Rhymes of a Red-Cross Man/Missis Moriarty’s Boy

The Shebeeners

Neural Data Augmentation via Example Extrapolation

Train GPT-2 in your own language

Archivist Quill Guide

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

Sample teaching materials, teg.ie

POSH: A Data-Aware Shell for Faster Distributed Text Processing

Físeáin - Mar a Déarfá!

Retrieval Augmented Generation with Huggingface Transformers and Ray

Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models

PanLex

This Non-Profit is Building the World’s Largest Lexical Translation Database

Text Classification Using DeepPavlov Library With PyTorch And Transformers

Spoken Corpus Linguistics in Romance: thoughts, design and results

Named Tensor Notation

namedtensor/notation

kakaobrain/pororo — PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

modernmt/modernmt — Neural Adaptive Machine Translation that adapts to context and learns from corrections.

quic/sense — Enhance your application with the ability to see and interact with humans using any RGB camera.

wenet-e2e/wenet — Production First and Production Ready End-to-End Speech Recognition Toolkit

mlpen/Nystromformer

ray-project/ray — Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Natooz/MidiTok — MIDI / symbolic music tokenizers for Deep Learning models 🎶

Books in Irish - Royal Irish Academy

Bibliography of Irish philology and of printed Irish literature, scan 2

IG01-15, IG01-16, IG01-19, IG01-25, IG02-10049, IG02-60, IG02-11771, IG02-66

Irish Language Forum - Study Group: Séadna

Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

The Irish of Iorras Aithneach, County Galway

Multilingual BERT has an accent: Evaluating English influences on fluency in multilingual models

My Man Jeeves

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Siamese networks with Keras, TensorFlow, and Deep Learning

Comparing images for similarity using siamese networks, Keras, and TensorFlow

Dense Passage Retrieval for Open-Domain Question Answering

messiaen/full-lattice-search — Full Text Search Over Probabilistic Lattices with Elasticsearch!

steveash/jopenfst — Partial Java port of the C++ OpenFST library

OlliSaarikivi/Automata — Automata and transducer library for .NET

Recreating Historical Streetscapes Using Deep Learning and Crowdsourcing

Ordnance Survey Index to the Map of the Town of Thurles

Thurles maps

Thurles map, 2

Thurles Town Square 1957

Old Photos of Thurles Co Tipperary Ireland

Praat on the Web: An Upgrade of Praat for Semi-Automatic Speech Annotation

monikaUPF/PraatontheWeb — Web implementation of Praat. Source code, running demo scripts on web, samples and documentation

ys10/Grapheme-PhonemeAlignment — This project aims to implement a algorithm to do a grapheme-phoneme alignment task.

kfirgoldberg/FUN — Official implementation of the FUN models

jailuthra/asr — Kaldi ASR wrapper scripts

CLDF for dummies

My ELAN workflow for segmenting and transcription

Sylli - The SSP Syllabifier

Sequitur G2P

lennes/spect — SpeCT - Speech Corpus Toolkit for Praat. Documentation

praaline/Praaline — Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora

CoEDL/elpis — 🙊 software for creating speech recognition models.

This is Hogwild!

Deep Speech : Train Native Languages with Transfer Learning Part #0b01

Prosogram

cohere-ai/natural-instructions — Expanding natural instructions

Bat banter is surprisingly nuanced

Yes you should understand backprop

The Unreasonable Effectiveness of Recurrent Neural Networks

Chrome Extension Programming: Illustrating a Basic Survival Skill with a Twitter Case Study

Feature Learning Escapades

A Survival Guide to a PhD

A Recipe for Training Neural Networks

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Sentence Embedding

How to Use Image Embeddings for Object Localization

Learning deep features to recognise speech emotion using merged deep CNN

How to Break GPU Memory Boundaries Even with Large Batch Sizes

zh217/torch-dct — DCT (discrete cosine transform) functions for pytorch

inejc/paragraph-vectors — A PyTorch implementation of Paragraph Vectors (doc2vec).

pperle/gaze-tracking — state-of-the-art gaze tracking model

tjysdsg/capt-public — Public version of my Computer-Aided Pronunciation Training (CAPT) system (server)

JawadAr/Pronunciation-verification-using-anomaly-detection-Thesis — This repository contains all the codes used in a thesis at Information Technology University (ITU). The topic of the thesis is pronunciation verification using anomaly detection.

googlefonts/gftools — Misc tools for working with the Google Fonts library

fonttools/fonttools — A library to manipulate font files from Python.

openjournals/joss — The Journal of Open Source Software

LaurentMazare/tch-rs — Rust bindings for the C++ api of PyTorch.

vesis84/kaldi-io-for-python — Python functions for reading kaldi data formats. Useful for rapid prototyping with python.

Peer Review: Implementing a “publish, then review” model of publishing

The Entropy of Words—Learnability and Expressivity across More than 1000 Languages, preprint

Exploiting Weak Ties in Incomplete Network Datasets Using Simplified Graph Convolutional Neural Networks

Why scientists are turning to Rust

Cookin’ with Rust

The Rust Programming Language

Phone set selection for HMM-based dialect speech synthesis

The On-Device Machine Learning Behind Recorder

Navigating Recorder Transcripts Easily, with Smart Scrolling

How to objectively measure phonetic distance?

CoEDL/elpis — 🙊 software for creating speech recognition models.

Neural circuit policies enabling auditable autonomy

The Challenges of using Transformers in ASR

Segment Anything Meets Point Tracking

SysCV/sam-pt — SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

bbc/subtitles-generator — A node module to generate subtitles by segmenting a list of time-coded text - BBC News Labs

ebu/libbw64 — Broadcast Wave 64 (ITU-R BS.2088) library

bbc/aes31-adl-composer — Work in progress - A node module to convert a json sequence into an AES31 ADL (audio decision list) compatible with SADiE audio editing software. For BBC News Labs digital paper edit project

bbc/audiowaveform — C++ program to generate waveform data and render waveform images from audio files

enochkan/torch-metrics — Metrics for model evaluation in pytorch

formiel/speech-translation — Multilingual speech translation

synesthesiam/sv_kaldi-montreal — Swedish voice2json profile based on Kaldi

mycrazycracy/speaker-embedding-with-phonetic-information — The code for the Interspeech paper “Speaker Embedding Extraction with Phonetic Information”

openXBOW/openXBOW — openXBOW - the Passau Open-Source Crossmodal Bag-of-Words Toolkit

fastai/numerical-linear-algebra — Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course

Irish

The Journey of Viscount Ramon de Perellós to Saint Patrick’s Purgatory

Mion-ċaint : an easy Irish phrase book

The Book of Kells

Researches in the South of Ireland

A little bit of Culture… Poetry from soc.culture.irish

Caoineadh Airt UÍ Laoghaire

An Caoineadh Airt Uí Laoghaire

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Train LLMs using QLoRA on Amazon SageMaker

apple/swift-algorithms — Commonly used sequence and collection algorithms for Swift

apple/sourcekit-lsp — Language Server Protocol implementation for Swift and C-based languages

apple/darwin-libplatform — Legacy mirror of Darwin Platform Library. Replaced by https://github.com/apple-oss-distributions/libplatform

apple/foundationdb — FoundationDB - the open source, distributed, transactional key-value store

marian-nmt/marian-examples — Examples, tutorials and use cases for Marian, including our WMT-2017/18 baselines.

pswietojanski/ojsp_adaptation_review_2020 — Auxiliary data and scripts for our OJSP review on speaker adaptation for speech recognition

Zeitschrift für celtische Philologie

Review: Oral Literature from Dunquin, County Kerry

A Gentle Introduction to the Huggingface Pipeline

Training with Marian

MarianNMT Examples

Séamus Ó Duilearga’s Co. Antrim notebooks

Hungarian Speecon database

BUSZI-2 guided conversations - downloadable transcripts

Nordic Dialect Corpus

The Swedish subproject of ScanDiaSyn

Skolt Saami Documentation Corpus

Gothenburg Dialogue Corpus

Hungarian BABEL

Hungarian Broadcast News Database

Hungarian National Corpus

Hungarian Kindergarten Language Corpus

Hungarian MRBA

Hungarian Medical Speech Database

Tunable Q-factor Wavelet Transform

PyWavelets/pywt — PyWavelets - Wavelet Transforms in Python

jollyjonson/tqwt_tools — Tunable-Q Wavelet Transform and Resonance-based Signal Decomposition Toolkit