Did you know that you can navigate the posts by swiping left and right?
I’ve been working on an automated approach to subitle alignment for the creation of speech data for the deep learning of text to speech synthesis.
However, the resulting data isn’t clean enough to create good quality TTS because it suffers from the following defects:
To resolve these, I created a web-base UI with data preview/edit capabilities, similar to finetuneas. However, unlike that work, my program:
The input to the program is required to be wav audio and json with the following format, where the times are in milliseconds:
[
{
"Start": 184170,
"Stop": 184284,
"Text": "YES, HE CAN!"
}
]
The program is available here. It is all client-side, so there’s no need to install it yourself.
The GitHub repository is here.