TECHNOLOGY FOR A FREE WORLD
Geotarget

Talk Details

Indic Tools for Community Knowledge Bases

Name Srinivasa Raghavan K
Organisation Sarai Consultant
Website http://mail.sarai.net:8080/indic
Scope Technical
Topic Localisation and Indic Computing
Type talk
Abstract Building a community knowledge base is a key requirement in NGO's,Government offices, schools and ofcourse our own offices. Most of these organizations or knowledge centers can now rely on being online so
that the user participation is not restricted to a certain physical location. These knowledge centers are typically built using web-based content management systems like PANTOTO, Plone, PhpNuke, etc. The browser is the key application that is used to get the community members participate by publishing stories or filling in forms or reports. This is ideally done using the native language or a regional language.

We consider the case of using indic languages by such communities and needed tools that would facilitate users to create, manage and search content in a local language. We elaborate on the indic language peculiarities and discuss "morphological analyser", one the of tools, in detail. We will also discuss the utility of this tool and other tools of interest to create, store and search local language content.

Input Method Editor (IME) is one such tool that allows a user to keyin content in any of the 10 Indian Language Scripts (Devanagari, Bengali, Gurumukhi, Gujarati, Kannada, Malayalam, Oriya, Tamil and Telugu) in a Web browser using a keyboard layout. Once the content is stored, the users should also be able to search for the local language content.

Search also involves fetching documents for all the possible inflexions of the given word. To achieve this, we have to store the 'root' of the input word in the search index. When user searches for a word, the 'root' of the input word is searched in the search index so
that the documents containing all possible inflexions are fetched. To get the root of a given input word, we need to analyse the word by applying some linguistic rules.

Morphological Analyser library (to extract the semantics of a word) is one such tool (developed in Java) which analyses the given word and fetches the semantics (root word, gender, tense, etc) of the given
word. The applications of the Morphological Analyser would be in search engines, spell checkers, machine translation systems etc. In other words, Morphological Analyser can be used in any application that would require word analysis for Indian Lanaguages. In the current context, a morphological analyser can be integrated with any Java based search engine like Lucene, etc. We will discuss its usage, how it works with search engines like Lucene and discuss directions on how these tools can be made more useful.

The above is available as open source code at: http://tinyurl.com/a7ps8
Pre-requisites Basic understanding about Indic Computing and some understanding about search engines functionality (indexing, searching etc)
Profile Raghavan, a computer science graduate, is a Sarai(http://sarai.net) Fellow working with Surekha Sastry on developing Indic tools. These tools would help to develop Community Knowledge Base in local languages. Prior to this, Raghavan had worked with Language Technologies Research Center group in developing an User Interface for "Anusaaraka", an English-Hindi Machine Translation System.

See Talk Schedule

 

SPONSORS

Principal Sponsor
[Govt of India/MCIT]

Diamond Sponsor
[Intel]

Platinum Sponsor
[Google]

Platinum Sponsor
[Sun Microsystems]

Gold Sponsor
[HP]

Silver Sponsors
[C-DAC]

Internet Sponsor
[Airtel]

PC Infrastructure Sponsor
[Connoiseur]

Supporting Sponsors
[GCI]

Event Logistics
Team Buenos
 
Genesis PR

Copyright © 2001-2005 Linux Bangalore
Linux ® is the registered trademark of Linus Torvalds in the U.S. and other countries
Comments? Feedback? Contact us!

Valid HTML 4.01 Transitional