Arabic Morphological Analysis




Historically – at my former employer Sakhr and Harf – I believed you have to support Arabic morphological analysis in order to have a good Arabic Full Text Search Engine. Recently I was developing an Arabic application using Java, we used Apache Lucene search component, as you know Java already internationalized, so Lucene supports Arabic without much effort. We studied to improve the search by adding morphological analysis.
What it means from the user perspective to have morphological analysis in search, it means searching by كتب (write) and you get results that has many variations, like كتابة، سيكتب، فكتب (writing, wrote, books, …) and so on. What a stupid idea, Look at Google, if you search by the keyword “reading”, it will find it only. Why we should slow down our search algorithm and just adding confusion to the user, Google even search in misspelled words and just offer you to search using the coorect words. What it means?, it means we should respect user input and use it as it, the search engine efforts should concentrate on getting the most relevant hits to the user query at the top.

My conclusion is not the morphological analysis is useless, I believe it is important in machine translation and many other nutural language processing applications but not search engines. The key feature of search engine are accuracy, ranking and of course speed.

If you want to understand Arabic Morphlogical Analysis, I suggest to look at Buckwalter efforts, you can also download his GPL version to learn how things is going on. His code is written in Perl, but you can easily understand his nice readme file, thanks BuckWalter [1] [2].

You can also develop a shallow morphological analyzer using the techniques suggested by Kareem M. Darwish, you can look at his web site [3], he has a nice research paper called titles “Building a Shallow Arabic Morphological Analyzer in One Day” [4]. A free software tools are available for free.

Research in Arabic morphology analyzer increased nowdays, you can find amazing number of research papers and dissertation online at Google Scholar [5].

Note: Later Buckwalter ported the code to java with GNU licenese, look at:
http://www.nongnu.org/aramorph/english/index.html

References:

1. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49
2. http://www.qamous.org/
3. http://www.glue.umd.edu/%7Ekareem/research/Publish
4. http://www.cs.um.edu.mt/~mros/WSL/papers/darwish.pdf
5. http://scholar.google.com/
6. http://www.nongnu.org/aramorph/english/index.html

* This post originally written on Jan 16, 2006

From ahm507.blogspot.com