Get max/min segment durations from textgrids
Aug 19, 2024
For a colleague
Aug 10, 2024
Common Crawl contains a lot of Google Translate output. See if you can guess the source material
Aug 2, 2024
Monkey patched WhisperX with changed segmentation
Jul 26, 2024
Because it was quicker than looking at the API examples
Jul 25, 2024
tl;dr: OWSM-CTC is good enough for alignment for Irish
Jun 27, 2024
Creating synthetic data for training
Jun 19, 2024
For a student project
Mar 3, 2024
For a student project
Feb 29, 2024
Generating sentences from Riksdag: in progress
Feb 26, 2024
Also basic pieces for scraping Sveriges Radio pages
Feb 17, 2024
For a student project
Feb 16, 2024
Runs ASR + phonetic recognition on two versions of Dubliners from Librivox: one (v2) with correct pronunciations, the other read by Americans
Feb 15, 2024
I can't remember what this was for; I'm sure I'll be reminded
Dec 15, 2023
Reading old data
Oct 17, 2023