Matcha-TTS uses piper_phonemize as its phonemiser, so for best results, the pre-phonemised text that you feed it should match that type of input: one method is to use a phoneset mapping between the two, to create a new dictionary with which to train an MFA model. This is an alternative approach, where you take the headwords from an MFA dictionary, and generate the pronunciations using piper_phonemize: this guarantees that the input matches not only the phoneset, but other conventions (for one, providing accent marks in the expected way) without the potential mess that can happen in phoneset mapping: e.g., that mappings are not always 1:1.

As an additional bonus, you can be assured that some of the pronunciations it generates will be incorrect: for testing an interface where the user provides their own pronunciations, this is a good thing!

Watch in amazement at how easy it is to install piper_phonemize on Linux, compared to how incredibly difficult it is on Mac...

%pip install piper_phonemize
Collecting piper_phonemize
  Downloading piper_phonemize-1.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (25.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.0/25.0 MB 41.5 MB/s eta 0:00:00
Installing collected packages: piper_phonemize
Successfully installed piper_phonemize-1.1.0

This function copies what Matcha does.

import piper_phonemize

def matcha_style_phonemizer(text):
    return piper_phonemize.phonemize_espeak(text=text, voice="en-US")

As input, I'm using the MFA 3.0 US dictionary, available here

%%capture
!wget https://github.com/MontrealCorpusTools/mfa-models/releases/download/dictionary-english_us_mfa-v3.0.0/english_us_mfa.dict

Let's take a quick look at the format:

!tail english_us_mfa.dict
zygon	z aj ɡ ɑ n
zygophyte	z ɪ ɡ ow f aj t
zygophyte	z aj ɡ ow f aj t
zygote	0.99	0.14	1.0	1.0	z aj ɡ ow t
zyme	z aj m
zymophyte	z aj m ow f aj t
zythum	z aj θ ə m
zyzzyva	0.99	0.14	1.0	1.0	z ɪ z ɪ v ə
zzyzx	z aj z ɪ k s
zzzs	z i z

So, we can see two kinds of line: one probabilistic, the other more basic. In either case, we are only interested in the first field of the tab-delimited file.

with open("english_us_mfa.dict") as mfadict:
    words = set()
    for line in mfadict.readlines():
        parts = line.split("\t")
        words.add(parts[0])

I'm not doing all of this for you: MFA expects tab delimited output, with word separated phonemes: by default, stress and duration marks are separate "phones". You get to check if this is correct input to Matcha.

with open("english_with_piper.dict", "w") as piperdict:
    for word in words:
        piper = matcha_style_phonemizer(word)
        for piper_item in piper:
            phon = " ".join(piper_item)
            piperdict.write(f"{word}\t{phon}\n")
!tail english_with_piper.dict
annihilating	ɐ n ˈ a ɪ ə l ˌ e ɪ ɾ ɪ ŋ
horatia	h o ː ɹ ˈ e ɪ ʃ ə
aliena	ˌ e ɪ l i ˈ ɛ n ə
disbanding	d ɪ s b ˈ æ n d ɪ ŋ
beginneth	b ɪ ɡ ˈ ɪ n ə θ
wordster	w ˈ ɜ ː d s t ɚ
sullying	s ˈ ʌ l i ɪ ŋ
offices	ˈ ɑ ː f ɪ s ᵻ z
toads	t ˈ o ʊ d z
projective	p ɹ ə d ʒ ˈ ɛ k t ɪ v