NeMo Forced Aligner

Demo for NeMo Forced Aligner (NFA). Upload audio and (optionally) the text spoken in the audio to generate a video where each part of the text will be highlighted as it is spoken.

You can also download CTM and ASS files to add subtitles to your videos.

Input

[Optional] For fun - adjust the colors of the text in the output video

text already spoken
text being spoken
text to be spoken

Output

You can use this space to convert CTM files to SRT format.

Tutorial: "How to use NFA?" ๐Ÿš€ | Blog post: "How does forced alignment work?" ๐Ÿ“š | NFA Github page ๐Ÿ‘ฉโ€๐Ÿ’ป

Examples
File upload [Optional] The reference text. Use '|' separators to specify which text will appear together. Leave this field blank to use an ASR model's transcription as the reference text instead.