Saturday, May 16, 2020

Examples of AWS CLI calls to the transcribe service (Speech to text) in Latin using Italian as a basis and extending with a Latin vocab:

I create the vocabulary:
A aws transcribe create-vocabulary --vocabulary-name latinSupplement --language-code it-IT --phrases "misericordiis" "aiternis" "aeternis"
B aws transcribe get-vocabulary --vocabulary-name latinSupplement

II create the transcribe job:
A aws transcribe delete-transcription-job --transcription-job-name latinNounPhrase
B create the json input file:
{
"TranscriptionJobName": "latinNounPhrase",
"LanguageCode": "it-IT",
"MediaFormat": "wav",
"Media": {
"MediaFileUri": "https://s3.us-west-2.amazonaws.com/www1.cloviscorp.com/collegium/grammar/resources/latin/sounds/Ali_GfNpCb_misericordia_aeternus.wav"
}
}
C aws transcribe start-transcription-job --cli-input-json file://test-start-command.json --settings VocabularyName=latinSupplement

D check status of asyn job (even a simple job can take more than 30 secs as of 16may20)
1 aws transcribe list-transcription-jobs --status COMPLETED
2 aws transcribe list-transcription-jobs --status IN_PROGRESS

E aws transcribe get-transcription-job --transcription-job-name latinNounPhrase

F download the output json containing the transcript by GETing the transcribe URL (in browser)

More reading: https://docs.aws.amazon.com/transcribe/latest/dg/getting-started-cli.html