Applied Problems of Functional Homonymy Disambiguation for Russian Language

Olga Nevzorova, Kazan State Pedagogical University

There is no common opinion on the phenomenon of homonymy in linguistic literature. The content of the phenomenon, its principles of classification and classificatory schemes are being discussed. The most common classification divides homonyms into lexical ones, i.e. referring to the same part of speech, and grammatical (functional) ones, i.e. referring to different parts of speech. In the actual work applied problems of functional homonymy description and methods of automatical disambiguation of functional homonymy of different types are researched.

Successfulness of applied research in computer linguistics depends remarkably on the availability of appropriate linguistic resources, lexicographical ones being the most important. In the recent years dictionaries of homonyms of Russian language by different authors have been published. In these dictionaries, the phenomenon of homonymy has been represented with various degree of fullness. The grand problem is mismatch of grammatical descriptions of homonyms in these dictionaries. For example, the comparison of the grammatical descriptions of 560 homonyms terminating on letter 'o' in [1-4] have shown that only three homonyms have been described with the same grammatical features.

There are essential distinctions in classifications of functional homonyms types. In Kobzareva&Afanasiev 2002, a classification of 58 homonymy types is given. Given classification have been enlarged by authors on basis of corpus methods. It is obvious that it needs to develop new dictionary of functional homonyms on the basis of representative corpus of Russian texts for applied research. For applied task of functional homonyms disambiguation we have developed the method on the basis of contextual rules. This method includes:
1. Establishing a full classification of functional homonymy types.
2. Selecting the minimal set of resolving contexts (SRC) for each type.
3. Building up a control structure for the SRC, allowing for the maximum resolution accuracy.

Program realization of presyntactic processing module of technical texts analysis system is currently being completed. Testing of the program module for the disambiguation of functional homonymy using corpus "Moschkov's library" (http://www.aot.ru) has given good results on the types realized. For some types, the accuracy of disambiguation is 99%, in the worst cases it is not less than 95%. The reasons for inaccuracy situations appearance are accidental concord in the context analyzed, context insufficiency or resource insufficiency (absence of a case frame dictionary for different parts of speech). Some mistakes in the resolution can be sorted out in the course of further analysis.
References
Anoshkina J.G. Slovar omonimichnyh slovoform russkogo jazyka. M: Mashinnyj fond russkogo jazyka Instituta russkogo jazyka RAN, 2001. (In Russian) (http://irlras-cfrl.rema.ru:8100/homoforms/index.htm).
Kim O.M., Ostrovkina I.E. Slovar graamaticheskih omonimov russkogo jazyka. M., 2004.
Russian Corpus (http://www.ruscorpora.ru/).
The tower of Babel (http://starling.rinet.ru/).
Kobzareva T.U., Afanasiev R.N. 2002. Universalnyj modul predsintaksicheskogo analiza omonimii chastej rechi na osnove slovarja diagnosticheskih situacij // Trudy mezhdunar. konferencii Dialog’2002. Moscow, 258-268.