#!/usr/bin/env bash # hyphenates a list of greek words that are one transliterated word per line INPUT=$1 cat $INPUT \ | sed -E 's#([AEOaeo]\^i[/\\]*|[AEIOUaeou][iu][/\\]*|[AEIOUaeiou][\^]*[/\\]*)#-\1-#g' \ | sed 's#-+-#-#g' \ | sed -E 's#([BDGPTKLMNRSbdgptklmnrs])([BDGPTKLMNRSbdgptklmnrs])#-\1-\2-#g' \ | sed -E 's#([PTKtpk])[-]+([Ss])#-\1\2-#g' \ | sed -E 's#(-[PTKtpk][Hh])#-\1-#g' \ | sed -E 's#[-]+#-#g' \ | sed -E 's#-([SRNsrn][-]*)*$#\1#' \ | sed -E 's#([^aeiouAEIOU/\\^])-([aeiouAEIOU])#\1\2#g' \ | sed -E 's#-([BDGPTKbdgptk][Hh]*)-([rl])#-\1\2#g' \ | sed -E 's#-([BDGPTKLMNRSbdgptklmnrs][Hh]*)-#\1-#g' \ | sed 's#^-##g'The transliteration is done with My Transliterator tool
Tuesday, January 7, 2020
A complete (reasonably reliable) hyphenator in Greek in just ten lines of sed-regex:
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment