TrebleCLEF Logo
Evaluation, Best Practices and Collaboration for Multilingual Information Access
Latest News
CLEF 2010: Padua, Italy September 2010
Read more - Link

TrebleCLEF workshop at eChallenges : Best Practices for Multilingual Information Access Istanbul,...
Read more - Link

Read all news

SETOOZ Search Engine

A research group at IIIT Hyderabad is developing a multilingual search engine: SETOOZ: www.setooz.com
It currently supports monolingual search in 13 morphologically complex languages (Български, Ελληνικά, Eesti, Latviešu, Lietuvių,Magyar, Norsk, Polski, Slovenčina , Suomi, Svenska, Türkçe, українська ). You can find a brief description below.

They would be very happy to receive feedback from the CLEF community with respect to its performance. Any comments or criticisms would be gratefully received and should be sent to Vasudeva Varma (vv@iiit.ac.in).
---------------------------------------------------
Here is a brief description of Setooz:

Setooz, pronounced as "say-th-uuz" means bridges. Setu in sanskrit means a bridge, and Setooz is an English plural inflection of the word Setu. Setooz is an intelligent web search engine catering to the world's non-English languages which are morphologically difficult to process.

The following are the features of this new search engine.

  • Does a focused crawl of the web, instead of full web crawl. This enables Setooz in looking for and giving more importance to non-English web pages and crawl more pages more frequently with a limited infrastructure.
  • Can recognize about 70 world language pages with good accuracy.
  • Ability to handle morphological variations of words to improve retrieval recall. Recall becomes even more important when number of web pages are relatively much less when compared to English.
  • Ability to handle spelling variations and mis-spelt words.
  • Ability to handle word segmentation for compound words, highly observed in Germanic kind of languages.
  • Analysis and query-independent weighting of web-pages based on the language/(s) of the web pages.
  • Fast and scalable retrieval algorithm capable of processing large volume of web pages.

Please note that there is a power user mode, in which the user can give a feedback about the quality of the results (for that matter anything). Once you give the query, just append the string "&puser=4d92kdu3ks9203ks882kls9" to the url, for the "first"search.

Ex : http://tr.setooz.com/search?lang=tr&query=wikipedia&puser=4d92kdu3ks9203ks882kls9

If you click on feedback link, a text box will be open where you can type in your comments. We also plan to release APIs to this search engine so that the research community can use this system for non-commercial activities. We expect the APIs to be ready within a month.

Please let me know your comments and feel free to pass it on to the people/community who might be interested in searching in these languages.

Vasudeva Varma, Ph.D.
Associate Professor
B2-107, Vindhyas
International Institute of Information Technology
Hyderabad - 500032, India
Home: www.iiit.ac.in/~vasu