{"id":35,"date":"2015-05-23T17:32:00","date_gmt":"2015-05-23T17:32:00","guid":{"rendered":"https:\/\/ahm.basfinans.com\/index.php\/2015\/05\/23\/discarding-arabic-diacritics-from-text\/"},"modified":"2015-05-23T17:32:00","modified_gmt":"2015-05-23T17:32:00","slug":"discarding-arabic-diacritics-from-text","status":"publish","type":"post","link":"https:\/\/ahm.basfinans.com\/index.php\/2015\/05\/23\/discarding-arabic-diacritics-from-text\/","title":{"rendered":"Discarding Arabic Diacritics From Text"},"content":{"rendered":"<p><\/p>\n<div>\nTo strip out diacritics from Arabic text, this is a little code in Python that I like a lot.<\/div>\n<div>\n<\/div>\n<div>\n<span style=\"color: navy; font-weight: bold;\">&nbsp; &nbsp; import <\/span>unicodedata<br \/>\n<span style=\"color: navy; font-weight: bold;\">&nbsp; &nbsp; return <\/span><span style=\"color: navy;\">filter<\/span>(<span style=\"color: navy; font-weight: bold;\">lambda <\/span>c: unicodedata.category(c) != <span style=\"color: green; font-weight: bold;\">&#8216;Mn&#8217;<\/span>, s)<\/p>\n<p>Look how it is elegant and small. Just one line with Lambda expression and a check of which the letter is unicode diacritics or not.<\/p>\n<p>The same job can be easily done in regular expressions with javascript. Look at this Javascript code:<\/p>\n<p>function stripAccents(text) {<br \/>\n&nbsp; &nbsp; return&nbsp;text.replace(new RegExp(&#8216;[u064B-u065F]*&#8217;, &#8216;g&#8217;), &#8221;);<br \/>\n}<\/div>\n<div>\n<\/div>\n<div>\nIt is one line that find one or more vowels in the range from u064B to u065F and just replace it with no thing.<\/div>\n<div>\n<\/div>\n<div>\nThanks to lambda and regular expressions. Both are really smart and helpful.<\/div>\n<div>\n<\/div>\n<div>\n<\/div>\n<div>\n<\/div>\n<div>From ahm507.blogspot.com<\/div>\n","protected":false},"excerpt":{"rendered":"<p>To strip out diacritics from Arabic text, this is a little code in Python that I like a lot. &nbsp; &nbsp; import unicodedata &nbsp; &nbsp; return filter(lambda c: unicodedata.category(c) != &#8216;Mn&#8217;, s) Look how it is elegant and small. Just one line with Lambda expression and a check of which the letter is unicode diacritics [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/posts\/35"}],"collection":[{"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/comments?post=35"}],"version-history":[{"count":0,"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/posts\/35\/revisions"}],"wp:attachment":[{"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/media?parent=35"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/categories?post=35"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ahm.basfinans.com\/index.php\/wp-json\/wp\/v2\/tags?post=35"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}