Posts

Waxholm lexicon processing
I can't remember what this was for; I'm sure I'll be reminded
Dec 15, 2023
Waxholm kludge
Reading old data
Oct 17, 2023
Swedish pronunciation comparison
Between English Wiktionary and phoneme recognition output
Oct 10, 2023
Waxholm phoneme fairseq
Fairseq data preparation for Waxholm phonetic transcriptions
Aug 10, 2023
Create Huggingface dataset from Hungarian TTS data
Mostly, it's the push_to_hub part that I'll forget
Jan 21, 2023
Liepa to fairseq
Convert the liepa2 corpus to fairseq
Jan 13, 2023
Extract audio from .pcm from NST Swedish Speech Synthesis
Raw, 8? 16? byte header, big endian PCM 44.1k
Jan 3, 2023
Playing with SimpleLanguageModel from pynlpl
I wanted to know if it could be used to write an ARPA LM. It cannot
Dec 6, 2022
Convert WebVTT to Elan
For Whisper's output
Nov 11, 2022
Simple replacement for Kaldi's align-text
Because life's too short to install Kaldi again
Nov 10, 2022
Phonetic transcription with HuggingFace
wav2vec2 espeak phonetic model
Oct 18, 2022
Using IrishNLP's chunker with NLTK
Convert chunks to a tree
Oct 3, 2022
Process Swedish Librivox text
Normalisation, adding boilerplate
Jun 30, 2022
Adapt `cmu_us_awb_arctic` to fairseq
Writing the tsv/ltr files; from Kaggle
May 7, 2022
LJSpeech for ASR
Resampled wav, more normalised text. From Kaggle
May 4, 2022