Rozdíly
Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.
Both sides previous revision Předchozí verze Následující verze | Předchozí verze | ||
czesl:czesl [2020/12/10 17:56] rosen [CzeSL – a Learner Corpus of Czech] |
czesl:czesl [2021/04/06 12:18] (aktuální) rosen [Available versions] |
||
---|---|---|---|
Řádek 1: | Řádek 1: | ||
+ | {{ :czesl:logolink_op_vvv_hor_barva_eng.jpg?600 |}} | ||
+ | |||
======= CzeSL – a Learner Corpus of Czech ======= | ======= CzeSL – a Learner Corpus of Czech ======= | ||
Řádek 9: | Řádek 11: | ||
* 2012–2016: Ministry of Education, Youth and Sports – //Czech National Corpus//, no. LM2011023 | * 2012–2016: Ministry of Education, Youth and Sports – //Czech National Corpus//, no. LM2011023 | ||
* 2016–2018 (extended to mid-2020): Grant Agency of the Czech Republic – [[https://ufal.mff.cuni.cz/czesl|Non-native Czech from the Theoretical and Computational Perspective]], no. 16-10185S | * 2016–2018 (extended to mid-2020): Grant Agency of the Czech Republic – [[https://ufal.mff.cuni.cz/czesl|Non-native Czech from the Theoretical and Computational Perspective]], no. 16-10185S | ||
- | * 2018–2022: KREAS, Faculty of Arts, Charles University; Structural and Investment Funds of the European Union –[[https://kreas.ff.cuni.cz/en/]] | + | * 2018–2022: [[https://kreas.ff.cuni.cz/en/|KREAS]], Faculty of Arts, Charles University; Structural and Investment Funds of the European Union |
* Alternative address of this site: [[http://utkl.ff.cuni.cz/learncorp/]] | * Alternative address of this site: [[http://utkl.ff.cuni.cz/learncorp/]] | ||
Řádek 15: | Řádek 17: | ||
===== Available versions ===== | ===== Available versions ===== | ||
- | | ^ Thousands of tokens in ^^^^ annotation ^^^^^ Metadata ^ Access ^ Year ^ | + | | ^ Thousands of tokens in ^^^^ Annotation ^^^^^ Metadata ^ Access ^ Year ^ |
- | | ::: ^ non-native ^^ ethnolect ^ 𝚺 ^ Error ^^ Linguistic ^^^:::^:::^:::^ | + | | ::: ^ non-native ^^ ethnolect ^ 𝚺 ^ error ^^ linguistic ^^^:::^:::^:::^ |
| ::: ^ essays ^ theses ^ ::: ^ ::: ^ Tags ^ TH ^ T0 ^ T1 ^ T2 ^ ::: ^ ::: ^ ::: ^ | | ::: ^ essays ^ theses ^ ::: ^ ::: ^ Tags ^ TH ^ T0 ^ T1 ^ T2 ^ ::: ^ ::: ^ ::: ^ | ||
^ CzeSL-plain | 1,315 | 732 | 428 | 2,475 | -- | -- | -- | -- | -- | -- | SD | 2012 | | ^ CzeSL-plain | 1,315 | 732 | 428 | 2,475 | -- | -- | -- | -- | -- | -- | SD | 2012 | | ||
Řádek 27: | Řádek 29: | ||
^ CzeSL-MD | 12 | -- | -- | 12 | MD | T2 | -- | -- | -- | -- | D | 2018 | | ^ CzeSL-MD | 12 | -- | -- | 12 | MD | T2 | -- | -- | -- | -- | D | 2018 | | ||
^ CzeSL-UD | 10 | -- | -- | 10 | -- | -- | M+S | -- | -- | -- | D | 2018 | | ^ CzeSL-UD | 10 | -- | -- | 10 | -- | -- | M+S | -- | -- | -- | D | 2018 | | ||
- | ^ CzeSL-GEC | ? | ? | -- | 20 | -- | 2T | -- | -- | -- | -- | D | 2017 | | + | ^ CzeSL-GEC | ? | ? | -- | 108 | -- | 2T | -- | -- | -- | -- | D | 2017 | |
^ AKCES-GEC | 336 | -- | 168 | 504 | G | 2T | -- | -- | -- | -- | D | 2019 | | ^ AKCES-GEC | 336 | -- | 168 | 504 | G | 2T | -- | -- | -- | -- | D | 2019 | | ||
^ CzeSL in TEITOK | 299 | -- | -- | 299 | F+I | 2T+ | M | M | M+S | yes | S | 2020 | | ^ CzeSL in TEITOK | 299 | -- | -- | 299 | F+I | 2T+ | M | M | M+S | yes | S | 2020 | | ||
Řádek 89: | Řádek 91: | ||
* Each text with its annotation consists of several related files. | * Each text with its annotation consists of several related files. | ||
* Some of the texts are independently annotated twice. | * Some of the texts are independently annotated twice. | ||
+ | * Includes also flat version (files named *.vert), see CzeSL-man v2 below. | ||
* **CzeSL-man v1 searchable**: | * **CzeSL-man v1 searchable**: | ||
* Searchable by KonText: https://kontext.korpus.cz/first_form?corpname=czesl-man | * Searchable by KonText: https://kontext.korpus.cz/first_form?corpname=czesl-man | ||
Řádek 104: | Řádek 107: | ||
* Apart from the error annotation, the content and metadata are the same as in CzeSL-man v1. | * Apart from the error annotation, the content and metadata are the same as in CzeSL-man v1. | ||
* Linguistic annotation (tags and lemmas) is provided for all tokens at Tier 0 and Tier 2. | * Linguistic annotation (tags and lemmas) is provided for all tokens at Tier 0 and Tier 2. | ||
+ | * Downloadable from https://bitbucket.org/czesl/czesl-man/ (files named *.vert). | ||
=== CzeSL-TH === | === CzeSL-TH === | ||
Řádek 153: | Řádek 157: | ||
* Multi-level concordancer [[http://utkl.ff.cuni.cz/czesl/selaq.html|SeLaQ]], used for basic searching in CzeSL-man | * Multi-level concordancer [[http://utkl.ff.cuni.cz/czesl/selaq.html|SeLaQ]], used for basic searching in CzeSL-man | ||
* Standard concordancer [[http://wiki.korpus.cz/doku.php/en:manualy:kontext:index|Manatee/KonText]], used for searching in CzeSL-plain and CzeSL-SGT | * Standard concordancer [[http://wiki.korpus.cz/doku.php/en:manualy:kontext:index|Manatee/KonText]], used for searching in CzeSL-plain and CzeSL-SGT | ||
- | * General corpus tool [[http://beta.clul.ul.pt/teitok/site/|TEITOK]], currently used for building, editing and viewing learner corpora hosted by the Institute of Theoretical and Computational linguistics (see [[http://utkl.ff.cuni.cz/teitok/|Learner corpora at ICTL]]) | + | * General corpus tool [[http://www.teitok.org|TEITOK]], currently used for building, editing and viewing learner corpora hosted by the Institute of Theoretical and Computational linguistics (see [[http://utkl.ff.cuni.cz/teitok/|Learner corpora at ICTL]]) |
===== Bibliography ===== | ===== Bibliography ===== | ||
Řádek 164: | Řádek 168: | ||
//Compiling and annotating a learner corpus for a morphologically rich language – CzeSL, a corpus of non-native Czech.// [[https://karolinum.cz|Karolinum, Charles University Press, Praha]]. [[https://karolinum.cz/knihy/rosen-compiling-and-annotating-a-learner-corpus-for-a-morphologically-rich-language-23802|Print copy, e-book]] [[https://dspace.cuni.cz/handle/20.500.11956/123103|CU Digital Repository]] | //Compiling and annotating a learner corpus for a morphologically rich language – CzeSL, a corpus of non-native Czech.// [[https://karolinum.cz|Karolinum, Charles University Press, Praha]]. [[https://karolinum.cz/knihy/rosen-compiling-and-annotating-a-learner-corpus-for-a-morphologically-rich-language-23802|Print copy, e-book]] [[https://dspace.cuni.cz/handle/20.500.11956/123103|CU Digital Repository]] | ||
+ | ===== Acknowledgement ===== | ||
+ | This work was supported by the European Regional Development Fund project “Creativity and Adaptability as Conditions of the Success of Europe in an Interrelated World” (reg. no.: CZ.02.1.01/0.0/0.0/16_019/0000734). |