Tuesday, January 7, 2020

A complete (reasonably reliable) hyphenator in Greek in just ten lines of sed-regex:
#!/usr/bin/env bash
# hyphenates a list of greek words that are one transliterated word per line
INPUT=$1
cat $INPUT \
| sed -E 's#([AEOaeo]\^i[/\\]*|[AEIOUaeou][iu][/\\]*|[AEIOUaeiou][\^]*[/\\]*)#-\1-#g' \
| sed 's#-+-#-#g' \
| sed -E 's#([BDGPTKLMNRSbdgptklmnrs])([BDGPTKLMNRSbdgptklmnrs])#-\1-\2-#g' \
| sed -E 's#([PTKtpk])[-]+([Ss])#-\1\2-#g' \
| sed -E 's#(-[PTKtpk][Hh])#-\1-#g' \
| sed -E 's#[-]+#-#g' \
| sed -E 's#-([SRNsrn][-]*)*$#\1#' \
| sed -E 's#([^aeiouAEIOU/\\^])-([aeiouAEIOU])#\1\2#g' \
| sed -E 's#-([BDGPTKbdgptk][Hh]*)-([rl])#-\1\2#g' \
| sed -E 's#-([BDGPTKLMNRSbdgptklmnrs][Hh]*)-#\1-#g' \
| sed 's#^-##g'
The transliteration is done with My Transliterator tool