Madan Puraskar Pustakalaya Lalitpur, PatanDhoka Nepal
Natural Language Processing (NLP) has been recently started as a scientific activity for Nepali, which is the national language of the Himalayan country, Nepal and also spoken by about 45 million people around the world. Nepali is written in the Devanagari script which takes its root from Sanskrit and was developed from the Brahmi script in the 11'th century A.D. It is a highly inflectional and a derivational language u2013 verbs in Nepali inflect for tense, aspect, mood and honorificity to some extent whereas nouns inflect for number and case. Lately, several NLP applications and tools have been developed like the spell checker, rule and transferbased machine translation system, stemmer, partsofspeech tagger, font converter etc. We will be basically focussing on the stemmer and the partsofspeech tagger. The stemmer follows a hybrid approach with a combination of brute force search and suffix stripping algorithms. The partsofspeech tagger on the other hand is a stochastic one and hence takes into account word count statistics for a particular pos tag in the corpus as well as the contextual tagging of the neighbouring words in a sentence given a word for which we have been looking into the ngrams approach.
Keywords: Brute Force, suffixstripping, ngrams.