MFA has phonological rules, but the implementation is useless. This approximates phonological rules for our speakers
Aug 27, 2024
Maybe a fine-tuned wav2vec model will work better with WhisperX
Aug 22, 2024
Get max/min segment durations from textgrids
Aug 19, 2024
For a colleague
Aug 10, 2024
Common Crawl contains a lot of Google Translate output. See if you can guess the source material
Aug 2, 2024
Monkey patched WhisperX with changed segmentation
Jul 26, 2024
Because it was quicker than looking at the API examples
Jul 25, 2024
tl;dr: OWSM-CTC is good enough for alignment for Irish
Jun 27, 2024
Creating synthetic data for training
Jun 19, 2024
For a student project
Mar 3, 2024
For a student project
Feb 29, 2024
Generating sentences from Riksdag: in progress
Feb 26, 2024
Also basic pieces for scraping Sveriges Radio pages
Feb 17, 2024
For a student project
Feb 16, 2024
Runs ASR + phonetic recognition on two versions of Dubliners from Librivox: one (v2) with correct pronunciations, the other read by Americans
Feb 15, 2024