To strip out diacritics from Arabic text, this is a little code in Python that I like a lot.
import unicodedata
return filter(lambda c: unicodedata.category(c) != ‘Mn’, s)
return filter(lambda c: unicodedata.category(c) != ‘Mn’, s)
Look how it is elegant and small. Just one line with Lambda expression and a check of which the letter is unicode diacritics or not.
The same job can be easily done in regular expressions with javascript. Look at this Javascript code:
function stripAccents(text) {
return text.replace(new RegExp(‘[u064B-u065F]*’, ‘g’), ”);
}
It is one line that find one or more vowels in the range from u064B to u065F and just replace it with no thing.
Thanks to lambda and regular expressions. Both are really smart and helpful.
From ahm507.blogspot.com