Examples of AWS CLI calls to the transcribe service (Speech to text) in Latin using Italian as a basis and extending with a Latin vocab:
I create the vocabulary:
A aws transcribe create-vocabulary --vocabulary-name latinSupplement --language-code it-IT --phrases "misericordiis" "aiternis" "aeternis"
B aws transcribe get-vocabulary --vocabulary-name latinSupplement
II create the transcribe job:
A aws transcribe delete-transcription-job --transcription-job-name latinNounPhrase
B create the json input file:
{
"TranscriptionJobName": "latinNounPhrase",
"LanguageCode": "it-IT",
"MediaFormat": "wav",
"Media": {
"MediaFileUri": "https://s3.us-west-2.amazonaws.com/www1.cloviscorp.com/collegium/grammar/resources/latin/sounds/Ali_GfNpCb_misericordia_aeternus.wav"
}
}
C aws transcribe start-transcription-job --cli-input-json file://test-start-command.json --settings VocabularyName=latinSupplement
D check status of asyn job (even a simple job can take more than 30 secs as of 16may20)
1 aws transcribe list-transcription-jobs --status COMPLETED
2 aws transcribe list-transcription-jobs --status IN_PROGRESS
E aws transcribe get-transcription-job --transcription-job-name latinNounPhrase
F download the output json containing the transcript by GETing the transcribe URL (in browser)
More reading: https://docs.aws.amazon.com/transcribe/latest/dg/getting-started-cli.html
Saturday, May 16, 2020
Tuesday, January 7, 2020
A complete (reasonably reliable) hyphenator in Greek in just ten lines of sed-regex:
#!/usr/bin/env bash # hyphenates a list of greek words that are one transliterated word per line INPUT=$1 cat $INPUT \ | sed -E 's#([AEOaeo]\^i[/\\]*|[AEIOUaeou][iu][/\\]*|[AEIOUaeiou][\^]*[/\\]*)#-\1-#g' \ | sed 's#-+-#-#g' \ | sed -E 's#([BDGPTKLMNRSbdgptklmnrs])([BDGPTKLMNRSbdgptklmnrs])#-\1-\2-#g' \ | sed -E 's#([PTKtpk])[-]+([Ss])#-\1\2-#g' \ | sed -E 's#(-[PTKtpk][Hh])#-\1-#g' \ | sed -E 's#[-]+#-#g' \ | sed -E 's#-([SRNsrn][-]*)*$#\1#' \ | sed -E 's#([^aeiouAEIOU/\\^])-([aeiouAEIOU])#\1\2#g' \ | sed -E 's#-([BDGPTKbdgptk][Hh]*)-([rl])#-\1\2#g' \ | sed -E 's#-([BDGPTKLMNRSbdgptklmnrs][Hh]*)-#\1-#g' \ | sed 's#^-##g'The transliteration is done with My Transliterator tool
Subscribe to:
Posts (Atom)