Just the scraper, processing later
Apr 29, 2021
Mostly playing with the examples
Apr 24, 2021
Possibly incomplete; Kaggle version here
Apr 24, 2021
What little Kashubian text there is on the internet seems to be in PDF. smh
Apr 23, 2021
Check if I haven't left anything out
Apr 23, 2021
Apr 22, 2021
tl;dr - there's a missing symlink
Apr 20, 2021
Apr 19, 2021
Case folding in Irish is odd; ICU can be used from most languages
Apr 18, 2021
Making/testing the dataset
Apr 14, 2021
M2M100 used CC-Aligned
Apr 14, 2021
So, do massively multilingual MT models trained on massively crawled datasets lead to great output? No
Apr 13, 2021
TTS test corpus for Irish from IDLAK
Apr 6, 2021
This took a while
Apr 6, 2021
How does it fare with closely related languages? Part 1: Processing
Mar 28, 2021