The relationship between the Czech standard code and various competing non-standard varieties presents definitional problems for linguists. On the one hand, the terms “standard” and “non-standard” are a convenient shorthand for explaining what “should” and “should not” be used in formal prose and speech. On the other hand, linguists have long recognized that the boundary between the standard and other varieties is blurred, and that the standard is not so much a monolithic construct as a collection of prescriptions and tendencies of varying strengths.
Special difficulties occur with forms that undergo standardization, or admission (by various more- or less-acknowledged authorities) to the canon of standard forms. In this study, we will examine two such points in Czech morphology -- the 1 sg. and 3 pl. of the non-past tense (e.g. kupuji/kupuju, kupují/kupujou) -- to see whether the descriptions standard, non-standard, colloquial, bookish, etc. used in handbooks are useful, adequate, and reflective of actual attested forms in written texts.
My approach for this talk is primarily quantitative, rather than qualitative, in that I look at the frequency of a variety of verb forms in the SYN2000 corpus of the Czech National Corpus (http://ucnk.ff.cuni.cz) and their distribution by text genre. SYN2000 is a 100-million-word searchable corpus of contemporary written Czech based on the principle of proportional representativity, meaning that the formula for text inclusion rests on empirical research into what the Czech populace reads, in what amounts. I examine data on four potential forms across the three major verb classes in which they are found and twenty-three sample verbs to test the validity of suppositions found in the intuitive generalizations of grammar handbooks.
Analysis shows that substantial differences in usage exist between various verb classes -- far more significant than what is suggested in handbooks. The 1986 Academy grammar's contention that there are significant acceptability differences at the level of individual lexemes is supported by the data from the Corpus. Recent handbooks have taken a permissive attitude towards use of the 3 pl. CC ending in writing, but the "standardization" of this form does not seem to be widely reflected in printed texts.
This research forms part of a larger cooperative international project, Exploring the Core and Limits of the Czech National Corpus.