Charles University
Faculty of Arts
Institute of Theoretical and Computational Linguistics
Czech syntactic lexicon
Hana Skoumalova
2001
Supervisor: Prof. PhDr. Jarmila Panevova, DrSc.
You can download the whole file here: .ps.gz,
.pdf.gz, or single parts from the
Table of Contents below.
You can look at slides from my lecture on
the lexicon (here in PDF -- both in Czech).
Users who know the password can browse the
finished lexicons.
Abstract
In this work, an electronic lexicon of Czech verbs is presented. The
lexicon contains valency frames of ca 15,000 Czech verbs, and its
purpose is to enrich information contained in other electronic
dictionaries. The trend of recent years is to make large-scale
reusable sources which can be combined with other sources. This work
shows how the lexicon cooperates with an existing morphological
lexicon and how it can be used in various NLP systems.
Chapter 2 discusses several theoretical approaches in
comparison with Functional Generative Description (FGD), which is used
for the dictionary. The explication concentrates especially on the
structure of lexicons in single theories. A lexicon usually conforms
certain preconditions resulting from using a given theoretical
framework, and so the possibility of creating a lexicon which would be
transferable to another theoretical framework is explored.
Chapter 3 discusses the possibility of using existing
sources, with respect to the desired result and the theoretical
framework adopted for the work. There were already several Czech
syntactic lexicons created in the past, but unfortunately their reuse
would be rather difficult. This chapter mentions several such
attempts, and describes in detail a lexicon which is used.
Chapter 4 describes the verb frame. First, the format of the lexical
entry is described, then various types of reflexive constructions in
Czech, and their encoding in the lexicon are discussed. In the next
section, possible diatheses of the basic (active) frame are shown, and
it is also discussed which of these diatheses can be added to the
dictionary on a regular basis and which have to be treated as
exceptions. The last section describes so called equi and raising
verbs.
In Chapter 5, the procedure of automatic conversion of the source
dictionary to the proposed format is shown. For this conversion, an
algorithm was created which assigns the functors (semantic roles) to
single members of a frame. The output of this procedure will serve as
an input for an editor. It is discussed what amount of the source data
can be completed by this procedure and what amount needs post-editing.
It is also shown how the resulting lexicon can be used in NLP systems.
Chapter 6 sums up. In Section 6.1, verbs are sorted into groups
according their frames, and the results are compared with results of
other researchers. In Section 6.2, perspectives of the language
processing based on symbolic methods are discussed, and the possible
usage of the lexicon in corpus linguistics.
-
- Acknowledgments . . . ii
- 1. Introduction . . . 1 (.ps, .pdf)
- 1.1. Terminological remarks . . . 2
- 2. Theoretical background . . . 3
- 2.1. An overview of FGD . . . 3
- 2.2. Comparing FGD with other theories . . . 6
- 2.2.1. Government Binding Theory . . . 6
- 2.2.2. Lexical Functional Grammar . . . 7
- 2.2.3. Head Driven Phrase Structure Grammar . . . 7
- 2.2.4. Comparison with FGD . . . 9
- 3. Using existing sources . . . 10 (.ps, .pdf)
- 3.1. Source data . . . 11
- 3.1.1. The attributes used in the lexicon and their values . . . 11
- 4. Content of the lexicon . . . 14 (.ps, .pdf)
- 4.1. Format of a lexical entry . . . 14
- 4.1.1. Voice . . . 15
- 4.1.2. Reflexivity . . . 16
- 4.1.3. Subject . . . 16
- 4.1.4. Functor . . . 17
- 4.1.5. Grammatemes . . . 17
- 4.1.6. Diatheses . . . 18
- 4.2. Reflexivity . . . 21
- 4.2.1. True reflexive with se. . . 21
- 4.2.2. True reflexive with si. . . 23
- 4.2.3. Reciprocal verbs with se. . . 23
- 4.2.4. Reciprocal verbs with si. . . 27
- 4.2.5. Reflexive tantum with se. . . 28
- 4.2.6. Derived reflexive verbs with se. . . 28
- 4.2.7. Reflexive tantum with si. . . 28
- 4.2.8. Derived reflexive verbs with si. . . 29
- 4.2.9. Reflexive with optional se. . . 29
- 4.2.10. Reflexive with optional si. . . 30
- 4.2.11. Reflexive passive . . . 31
- 4.2.12. Mediopassive . . . 31
- 4.2.13. Homonymy of reflexive verbs . . . 31
- 4.3. Diatheses . . . 33
- 4.3.1. Diatheses encoded in the lexicon . . . 40
- 4.3.2. Periphrastic passive . . . 41
- 4.3.3. Reflexive passive . . . 44
- 4.3.4. Mediopassive . . . 46
- 4.3.5. Constructions with mít and dostat . . . 47
- 4.3.6. Resultative construction with mít . . . 49
- 4.4. Verbs with the infinitive in their frames . . . 49
- 4.4.1. Raising verbs . . . 55
- 4.4.2. Equi verbs . . . 59
- 5. Algorithm for processing the surface frames . . . 66
(.ps, .pdf)
- 5.1. Identifying and merging frames, marking the obligatority . . . 66
- 5.2. Assigning functors . . . 68
- 5.3. Marking diatheses . . . 73
- 5.4. Usage of the final lexicon . . . 73
- 5.4.1. Generating frame instances from frames . . . 74
- 5.4.2. Extracting subcat lists . . . 76
- 6. Conclusions . . . 78
(.ps, .pdf)
- 6.1. Verb grouping . . . 78
- 6.2. Further perspectives . . . 80
- Bibliography . . . 81
- Subject index . . . 86
- Verbs used in examples . . . 88
- A. Abbreviations . . . 90
(.ps, .pdf)
- B. Symbols used in the dictionary . . . 92
- B.1. Voice . . . 92
- B.2. Reflexivity . . . 92
- B.3. Subject . . . 93
- B.4. Functors . . . 93
- B.5. Grammatemes . . . 94
- B.6. Obligatority . . . 96
- B.7. Passive and other diathesis . . . 96
|
- C. Possible functors assigned to grammatemes . . . 97
- C.1. Abbreviations used in lists of possible functors . . . 97
- C.2. Lists of functors attached to every surface realization . . . 98
- D. Algorithm for assigning functors . . . 102
- D.1. Prototypical and less typical surface forms . . . 102
- D.2. Assigning non prototypical frame . . . 103
- D.3. Results . . . 103
- D.3.1. Verbs processed fully automatically . . . 103
- D.3.2. Verbs with ambiguous frames . . . 108
- E. Classification of Czech frames . . . 115
- E.1. Automatically processed frames . . . 115
- E.2. Ambiguous frames . . . 116
- F. Experiment with LFG . . . 121
(.ps, .pdf)
- F.1. Verb lexicon . . . 121
- F.2. Templates . . . 122
- F.3. Lexical rules . . . 123
- F.4. Grammar . . . 125
- F.5. Test sentences . . . 126
- G. Web interface to the lexicon . . . 132
(.ps, .pdf)
-
2nd part (.ps, .pdf)
-
3rd part (.ps, .pdf)
List of Tables
- 4.1. Taxonomy of reflexive verbs . . . 21
- 4.2. Three types of reciprocal verbs . . . 24
- 4.3. Reciprocal verbs with si. . . 27
- 4.4. Subject diatheses . . . 39
- 4.5. Subject diatheses revisited . . . 40
- 5.1. Identifying single frames . . . 67
- 5.2. Merging frame variants . . . 67
- 5.3. Prototypical frames . . . 70
- 5.4. Non prototypical frames . . . 70
- 5.5. Merging frame of the verb čertit se (be angry) . . . 71
- 6.1. Classification of verbs . . . 78
- 6.2. Classification of verbs with adjuncts simplified . . . 79
List of Figures
- 4.1. Three level system . . . 36
- 4.2. Three level system revisited . . . 37
- 5.1. Mapping between TL and ML in active voice . . . 69
- 5.2. Mapping between TL and ML for verbs with at least three actants . . . 69
- D.1. The algorithm for assigning functors to non prototypical frame . . . 104
- F.1. Simple grammar in LFG . . . 125
- F.2. Testing sentences . . . 126
- F.3. C structure of sentence 140a . . . 127
- F.4. F structure of sentence 140a . . . 127
- F.5. C structure of sentence 140b . . . 128
- F.6. F structure of sentence 140b . . . 128
- F.7. C structure of sentence 140c . . . 129
- F.8. F structure of sentence 140c . . . 129
- F.9. C structure of sentence 140d . . . 130
| | | | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |