!pip install nemo_toolkit['all']

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")

Version 1 Version 2 Text

!wget https://www.archive.org/download/crusoe_1_syllable_librivox/crusoe1syllable_01_godolphin_64kb.mp3

!ffmpeg -i crusoe1syllable_01_godolphin_64kb.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav

asr_model.transcribe(['output.wav'])

[NeMo W 2026-01-26 18:57:22 nemo_logging:405] The following configuration keys are ignored by Lhotse dataloader: use_start_end_token
[NeMo W 2026-01-26 18:57:22 nemo_logging:405] You are using a non-tarred dataset and requested tokenization during data sampling (pretokenize=True). This will cause the tokenization to happen in the main (GPU) process,possibly impacting the training speed if your tokenizer is very large.If the impact is noticable, set pretokenize=False in dataloader config.(note: that will disable token-per-second filtering and 2D bucketing features)
Transcribing: 0it [00:00, ?it/s]