====== Tagování textů pro Ortofon ====== * Texty dostáváme s mark-upem tak taky můžeš jít na ryby ===== Postup práce ===== * Každý text je anotován **jedním** lidským anotátorem a slitými výsledky **hybridu** vs **MorphoDiTa**. * Máme deset adresářů s texty od Davida: ''/store/corp/Ortofon/ortofon-data/01--05'' (první dávka) a ''/store/corp/Ortofon/ortofon-data/06--10'' (druhá dávka). * Ruční anotace se provádí v adresářích ''/store/corp/Ortofon/ortofon-manual/davka-?''. * Automatická anotace se provádí v adresářích ''/store/corp/Ortofon/ortofon-hybrid/davka-?'' a ''/store/corp/Ortofon/ortofon-morphodita/davka?''. * Závěrečné slití ruční a automatické anotace se provede v adresáři ''/store/corp/Ortofon/ortofon-etalon/davka-?''. ===== Ruční anotace ===== ==== Příprava textů pro anotátory ==== * Všechny soubory dáme do adresáře ''chunks'' * Vyrobíme ''csts'': parallel-filter.sh -C "cut -f1 | perl -pe 's/\.\.\./&thellip;/g' \ | perl -pe 's/\.\./&dhellip;/g' | replace_spaces.pl \ | perl -pe 's/::' \ | vert_csts.pl | perl -pe 'undef $/; s/\n * **Tagujeme hybridem!** * Provedeme morfologii: make-corp.sh -s csts -t csts-morf -Eucs2 -A1 -B1 -M -p45 -v * Rozhodneme //vole// a //von//: cd csts-morf for ff in *; do echo $ff; \ perl -i -pe 's/vole<.*/volevůlNNMS5-----A----/' $ff; done for ff in *; do echo $ff; \ perl -i -pe 's/(]*>von)<.*/$1onPPYS1--3------6/' $ff; done * Provedeme pravidla a frazémy: make-whole-corp-csts.sh -Eucs2 -M -v -p45 -trules -Tfrazrl * Upravíme tagy:parallel-filter.sh -C "normalize-anot-csts.pl \ | simplify-tags-csts-utf.pl| remove-dupl-csts-mark.pl X" -p45 \ -s csts-rules-frazrl -t csts-import -vA ještě zjednodušit tagy a ošetřit zvuky v pozadí. * Naděláme linky pro jednotlivé anotátory. * Další kroky provedeme **na jakobsonovi**. * Import souboru:/usr/local/annotate/bin/csts-import-utkl.pl --force 05-16X001N_2-HS * Upravit ''/usr/local/annotate/users''. ==== Anotátoři a přidělené soubory (davka-1) ==== === Michal Havrda - MH === * xaf (4541): **hotovo** -rw-rw-r-- 1 skoumal users 169887 Nov 7 17:40 04-13B019N_0-MH * -rw-rw-r-- 1 skoumal users 147849 Nov 7 17:40 04-13B025N_0-MH * -rw-rw-r-- 1 skoumal users 136246 Nov 7 17:40 04-13O007N_1-MH * -rw-rw-r-- 1 skoumal users 133560 Nov 7 17:40 04-13O010N_0-MH * -rw-rw-r-- 1 skoumal users 165228 Nov 7 17:40 04-13O013N_0-MH * -rw-rw-r-- 1 skoumal users 119827 Nov 7 17:40 04-13P004N_2-MH ** -rw-rw-r-- 1 skoumal users 129590 Nov 7 17:40 04-14P007N_0-MH * -rw-rw-r-- 1 skoumal users 166905 Nov 7 17:40 04-14T003N_0-MH * -rw-rw-r-- 1 skoumal users 152270 Nov 7 17:40 04-14T010N_1-MH * -rw-rw-r-- 1 skoumal users 141300 Nov 7 17:40 04-14T013N_1-MH * -rw-rw-r-- 1 skoumal users 148744 Nov 7 17:40 04-14X016N_3-MH * -rw-rw-r-- 1 skoumal users 131885 Nov 7 17:40 04-15O010N_0-MH * -rw-rw-r-- 1 skoumal users 153940 Nov 7 17:40 04-15P001N_2-MH * -rw-rw-r-- 1 skoumal users 119987 Nov 7 17:40 04-15P006N_0-MH ** -rw-rw-r-- 1 skoumal users 163310 Nov 7 17:40 05-16X001N_2-MH ** === Václav Horký - VH === * xaa (4238): -rw-rw-r-- 1 skoumal users 97843 Nov 7 17:40 01-12A005N_1-VH -rw-rw-r-- 1 skoumal users 124148 Nov 7 17:40 01-13A014N_1-VH -rw-rw-r-- 1 skoumal users 139676 Nov 7 17:40 01-13A028N_1-VH -rw-rw-r-- 1 skoumal users 162599 Nov 7 17:40 01-13B031N_1-VH -rw-rw-r-- 1 skoumal users 128254 Nov 7 17:40 01-13H005N_1-VH -rw-rw-r-- 1 skoumal users 176088 Nov 7 17:40 01-13O009N_3-VH -rw-rw-r-- 1 skoumal users 128760 Nov 7 17:40 01-13P004N_1-VH -rw-rw-r-- 1 skoumal users 139368 Nov 7 17:40 01-13P009N_2-VH -rw-rw-r-- 1 skoumal users 135085 Nov 7 17:40 01-14T003N_6-VH -rw-rw-r-- 1 skoumal users 152015 Nov 7 17:40 01-14T007N_1-VH -rw-rw-r-- 1 skoumal users 153372 Nov 7 17:40 01-14T010N_0-VH -rw-rw-r-- 1 skoumal users 166194 Nov 7 17:40 01-14X016N_6-VH -rw-rw-r-- 1 skoumal users 168697 Nov 7 17:40 01-14X019N_0-VH -rw-rw-r-- 1 skoumal users 160336 Nov 7 17:40 01-15O001N_0-VH === Šárka Kadavá - SK === * xad (4521): **hotovo** -rw-rw-r-- 1 skoumal users 173330 Nov 7 17:40 02-16A005N_5-SK -rw-rw-r-- 1 skoumal users 149878 Nov 7 17:40 02-16P002N_0-SK -rw-rw-r-- 1 skoumal users 118822 Nov 7 17:40 02-16X003N_5-SK -rw-rw-r-- 1 skoumal users 138030 Nov 7 17:40 03-12A035N_3-SK -rw-rw-r-- 1 skoumal users 130944 Nov 7 17:40 03-13A014N_4-SK -rw-rw-r-- 1 skoumal users 138479 Nov 7 17:40 03-13O009N_2-SK -rw-rw-r-- 1 skoumal users 132373 Nov 7 17:40 03-13P010N_1-SK -rw-rw-r-- 1 skoumal users 159964 Nov 7 17:40 03-14A011N_3-SK -rw-rw-r-- 1 skoumal users 122871 Nov 7 17:40 03-14A016N_4-SK -rw-rw-r-- 1 skoumal users 137852 Nov 7 17:40 03-14P007N_2-SK -rw-rw-r-- 1 skoumal users 186207 Nov 7 17:40 03-14T010N_2-SK -rw-rw-r-- 1 skoumal users 151313 Nov 7 17:40 03-14T013N_0-SK -rw-rw-r-- 1 skoumal users 162548 Nov 7 17:40 03-14T020N_0-SK -rw-rw-r-- 1 skoumal users 172522 Nov 7 17:40 03-14X019N_4-SK === Pavel Kopřiva - PK === * xag (4601): **hotovo** -rw-rw-r-- 1 skoumal users 128575 Nov 7 17:40 04-15X030N_3-PK -rw-rw-r-- 1 skoumal users 152738 Nov 7 17:40 04-15X043N_2-PK -rw-rw-r-- 1 skoumal users 162024 Nov 7 17:40 04-16A009N_2-PK -rw-rw-r-- 1 skoumal users 114272 Nov 7 17:40 04-16E007N_2-PK -rw-rw-r-- 1 skoumal users 161129 Nov 7 17:40 04-16P004N_1-PK -rw-rw-r-- 1 skoumal users 145303 Nov 7 17:40 04-16X003N_2-PK -rw-rw-r-- 1 skoumal users 153704 Nov 7 17:40 05-12P004N_2-PK -rw-rw-r-- 1 skoumal users 141893 Nov 7 17:40 05-13A011N_0-PK -rw-rw-r-- 1 skoumal users 136846 Nov 7 17:40 05-13A014N_3-PK -rw-rw-r-- 1 skoumal users 117562 Nov 7 17:40 05-13A023N_3-PK -rw-rw-r-- 1 skoumal users 145722 Nov 7 17:40 05-13B005N_1-PK -rw-rw-r-- 1 skoumal users 169153 Nov 7 17:40 05-13D015N_0-PK -rw-rw-r-- 1 skoumal users 141199 Nov 7 17:40 05-13O007N_0-PK -rw-rw-r-- 1 skoumal users 172005 Nov 7 17:40 05-14A011N_2-PK === Lucie Onari Kreslová - LK === * xae (4451): **hotovo** -rw-rw-r-- 1 skoumal users 136241 Nov 7 17:40 03-14X021N_2-LK -rw-rw-r-- 1 skoumal users 180763 Nov 7 17:40 03-15E003N_0-LK -rw-rw-r-- 1 skoumal users 144355 Nov 7 17:40 03-15E015N_1-LK -rw-rw-r-- 1 skoumal users 135982 Nov 7 17:40 03-15O002N_1-LK -rw-rw-r-- 1 skoumal users 171807 Nov 7 17:40 03-15O007N_0-LK -rw-rw-r-- 1 skoumal users 175211 Nov 7 17:40 03-15X020N_1-LK -rw-rw-r-- 1 skoumal users 158941 Nov 7 17:40 03-15X041N_2-LK -rw-rw-r-- 1 skoumal users 170958 Nov 7 17:40 03-16A005N_4-LK -rw-rw-r-- 1 skoumal users 140487 Nov 7 17:40 03-16P004N_0-LK -rw-rw-r-- 1 skoumal users 171624 Nov 7 17:40 03-16X001N_1-LK -rw-rw-r-- 1 skoumal users 148677 Nov 7 17:40 03-16X031N_4-LK -rw-rw-r-- 1 skoumal users 115839 Nov 7 17:40 04-12A025N_0-LK -rw-rw-r-- 1 skoumal users 172208 Nov 7 17:40 04-12P004N_4-LK -rw-rw-r-- 1 skoumal users 200543 Nov 7 17:40 04-13B009N_0-LK === Tereza Marková - TM === * xab (4269): - **hotovo** -rw-rw-r-- 1 skoumal users 144719 Nov 7 17:40 01-15O004N_0-TM * -rw-rw-r-- 1 skoumal users 147429 Nov 7 17:40 01-15O007N_1-TM * -rw-rw-r-- 1 skoumal users 156462 Nov 7 17:40 01-15P001N_1-TM ** -rw-rw-r-- 1 skoumal users 188348 Nov 7 17:40 01-15T005N_0-TM ** -rw-rw-r-- 1 skoumal users 149779 Nov 7 17:40 01-15X045N_2-TM * -rw-rw-r-- 1 skoumal users 165698 Nov 7 17:40 01-16A005N_2-TM * -rw-rw-r-- 1 skoumal users 132634 Nov 7 17:40 01-16P002N_2-TM * -rw-rw-r-- 1 skoumal users 121098 Nov 7 17:40 01-16X033N_5-TM * -rw-rw-r-- 1 skoumal users 122287 Nov 7 17:40 02-12A011N_1-TM * -rw-rw-r-- 1 skoumal users 146221 Nov 7 17:40 02-13A011N_5-TM * -rw-rw-r-- 1 skoumal users 136882 Nov 7 17:40 02-13A036N_3-TM ** -rw-rw-r-- 1 skoumal users 198356 Nov 7 17:40 02-13B031N_0-TM * -rw-rw-r-- 1 skoumal users 112397 Nov 7 17:40 02-13O009N_1-TM ** -rw-rw-r-- 1 skoumal users 126312 Nov 7 17:40 02-13O010N_2-TM ** === Anna Nováková - AN === * xah (4552): -rw-rw-r-- 1 skoumal users 190480 Nov 7 17:40 05-14T010N_3-AN -rw-rw-r-- 1 skoumal users 122224 Nov 7 17:40 05-14T019N_0-AN -rw-rw-r-- 1 skoumal users 154255 Nov 7 17:40 05-14X012N_2-AN -rw-rw-r-- 1 skoumal users 189324 Nov 7 17:40 05-14X019N_2-AN -rw-rw-r-- 1 skoumal users 183678 Nov 7 17:40 05-14X019N_3-AN -rw-rw-r-- 1 skoumal users 154400 Nov 7 17:40 05-15O001N_1-AN -rw-rw-r-- 1 skoumal users 150413 Nov 7 17:40 05-15P001N_0-AN -rw-rw-r-- 1 skoumal users 206423 Nov 7 17:40 05-15X009N_1-AN -rw-rw-r-- 1 skoumal users 181125 Nov 7 17:40 05-15X020N_3-AN -rw-rw-r-- 1 skoumal users 99237 Nov 7 17:40 05-15X043N_5-AN -rw-rw-r-- 1 skoumal users 182076 Nov 7 17:40 05-16A005N_1-AN -rw-rw-r-- 1 skoumal users 122299 Nov 7 17:40 05-16E007N_0-AN -rw-rw-r-- 1 skoumal users 120653 Nov 7 17:40 05-16P002N_1-AN -rw-rw-r-- 1 skoumal users 131582 Nov 7 17:40 05-16P007N_1-AN === Michal Zlatkovský - MZ === * xac (4393): -rw-rw-r-- 1 skoumal users 160624 Nov 7 17:40 02-13O013N_1-MZ -rw-rw-r-- 1 skoumal users 119673 Nov 7 17:40 02-13P004N_3-MZ -rw-rw-r-- 1 skoumal users 120363 Nov 7 17:40 02-13T029N_6-MZ -rw-rw-r-- 1 skoumal users 117594 Nov 7 17:40 02-14A016N_2-MZ -rw-rw-r-- 1 skoumal users 156603 Nov 7 17:40 02-14E003N_0-MZ -rw-rw-r-- 1 skoumal users 153488 Nov 7 17:40 02-14T003N_5-MZ -rw-rw-r-- 1 skoumal users 147516 Nov 7 17:40 02-14T013N_2-MZ -rw-rw-r-- 1 skoumal users 178036 Nov 7 17:40 02-14T020N_3-MZ -rw-rw-r-- 1 skoumal users 172908 Nov 7 17:40 02-15O002N_0-MZ -rw-rw-r-- 1 skoumal users 122828 Nov 7 17:40 02-15O004N_1-MZ -rw-rw-r-- 1 skoumal users 127983 Nov 7 17:40 02-15O009N_1-MZ -rw-rw-r-- 1 skoumal users 129864 Nov 7 17:40 02-15P001N_3-MZ -rw-rw-r-- 1 skoumal users 129648 Nov 7 17:40 02-15X041N_1-MZ -rw-rw-r-- 1 skoumal users 150095 Nov 7 17:40 02-16A001N_0-MZ ==== Anotátoři a přidělené soubory (davka-2) ==== === Václav Horký - VH === * (7862) -- hotovo: -rw-rw-r-- 1 skoumal users 142827 May 27 17:40 06-12A011N_0-VH -rw-rw-r-- 1 skoumal users 166457 May 27 17:40 06-12P004N_3-VH -rw-rw-r-- 1 skoumal users 131514 May 27 17:40 06-13A003N_1-VH -rw-rw-r-- 1 skoumal users 128413 May 27 17:40 06-13A014N_2-VH -rw-rw-r-- 1 skoumal users 144623 May 27 17:40 06-13A028N_2-VH -rw-rw-r-- 1 skoumal users 150403 May 27 17:40 06-13A074N_1-VH -rw-rw-r-- 1 skoumal users 139190 May 27 17:40 06-13B005N_0-VH -rw-rw-r-- 1 skoumal users 189188 May 27 17:40 06-13B028N_1-VH -rw-rw-r-- 1 skoumal users 155333 May 27 17:40 06-13O007N_2-VH -rw-rw-r-- 1 skoumal users 189392 May 27 17:40 06-14A006N_0-VH -rw-rw-r-- 1 skoumal users 143619 May 27 17:40 06-14A008N_3-VH -rw-rw-r-- 1 skoumal users 184286 May 27 17:40 06-14E001N_0-VH -rw-rw-r-- 1 skoumal users 149792 May 27 17:40 06-14P007N_3-VH -rw-rw-r-- 1 skoumal users 147049 May 27 17:40 06-14T007N_0-VH -rw-rw-r-- 1 skoumal users 164936 May 27 17:40 06-14T020N_1-VH -rw-rw-r-- 1 skoumal users 170219 May 27 17:40 06-14X016N_1-VH -rw-rw-r-- 1 skoumal users 128950 May 27 17:40 06-15E017N_5-VH -rw-rw-r-- 1 skoumal users 166806 May 27 17:40 06-15O010N_2-VH -rw-rw-r-- 1 skoumal users 137835 May 27 17:40 06-16A001N_3-VH -rw-rw-r-- 1 skoumal users 131549 May 27 17:40 06-16P002N_3-VH -rw-rw-r-- 1 skoumal users 106379 May 27 17:40 06-16P007N_5-VH -rw-rw-r-- 1 skoumal users 141727 May 27 17:40 06-16X003N_1-VH -rw-rw-r-- 1 skoumal users 169034 May 27 17:40 06-16X030N_1-VH * (7028) -- hotovo: -rw-r--r-- 1 skoumal users 91649 Jun 24 18:06 07-12A037N_4-VH -rw-r--r-- 1 skoumal users 161669 Jun 24 18:06 07-12O002N_0-VH -rw-r--r-- 1 skoumal users 144382 Jun 24 18:06 07-13A003N_2-VH -rw-r--r-- 1 skoumal users 148109 Jun 24 18:06 07-13A014N_0-VH -rw-r--r-- 1 skoumal users 106736 Jun 24 18:06 07-13A028N_4-VH -rw-r--r-- 1 skoumal users 142377 Jun 24 18:06 07-13A036N_5-VH -rw-r--r-- 1 skoumal users 149087 Jun 24 18:06 07-13A050N_0-VH -rw-r--r-- 1 skoumal users 166318 Jun 24 18:06 07-13E004N_6-VH -rw-r--r-- 1 skoumal users 154101 Jun 24 18:06 07-13O004N_0-VH -rw-r--r-- 1 skoumal users 144107 Jun 24 18:06 07-14T003N_4-VH -rw-r--r-- 1 skoumal users 121502 Jun 24 18:06 07-14T007N_2-VH -rw-r--r-- 1 skoumal users 149152 Jun 24 18:06 07-14T013N_3-VH -rw-r--r-- 1 skoumal users 152439 Jun 24 18:06 07-14X016N_4-VH -rw-r--r-- 1 skoumal users 131964 Jun 24 18:06 07-15C004N_0-VH -rw-r--r-- 1 skoumal users 134751 Jun 24 18:06 07-15O009N_0-VH -rw-r--r-- 1 skoumal users 129301 Jun 24 18:06 07-15P002N_0-VH -rw-r--r-- 1 skoumal users 180038 Jun 24 18:06 07-16A005N_0-VH -rw-r--r-- 1 skoumal users 133426 Jun 24 18:06 07-16A009N_3-VH -rw-r--r-- 1 skoumal users 125528 Jun 24 18:06 07-16P007N_2-VH -rw-r--r-- 1 skoumal users 128922 Jun 24 18:06 07-16X003N_3-VH -rw-r--r-- 1 skoumal users 140285 Jun 24 18:06 07-16X031N_0-VH -rw-r--r-- 1 skoumal users 129726 Jun 24 18:06 07-16X033N_3-VH === Anna Nováková - AN === * (7634) -- hotovo: -rw-r--r-- 1 skoumal users 136812 Jun 24 18:06 08-12A009N_0-AN -rw-r--r-- 1 skoumal users 162292 Jun 24 18:06 08-12A031N_0-AN -rw-r--r-- 1 skoumal users 174507 Jun 24 18:06 08-13A018N_0-AN -rw-r--r-- 1 skoumal users 150144 Jun 24 18:06 08-13A036N_0-AN -rw-r--r-- 1 skoumal users 147524 Jun 24 18:06 08-13A090N_4-AN -rw-r--r-- 1 skoumal users 159163 Jun 24 18:06 08-13B019N_1-AN -rw-r--r-- 1 skoumal users 121300 Jun 24 18:06 08-13B028N_0-AN -rw-r--r-- 1 skoumal users 117897 Jun 24 18:06 08-13O009N_0-AN -rw-r--r-- 1 skoumal users 122809 Jun 24 18:06 08-13P004N_0-AN -rw-r--r-- 1 skoumal users 159431 Jun 24 18:06 08-14C006N_0-AN -rw-r--r-- 1 skoumal users 153240 Jun 24 18:06 08-14T003N_1-AN -rw-r--r-- 1 skoumal users 132491 Jun 24 18:06 08-14T014N_4-AN -rw-r--r-- 1 skoumal users 157464 Jun 24 18:06 08-14X016N_5-AN -rw-r--r-- 1 skoumal users 164278 Jun 24 18:06 08-15E010N_5-AN -rw-r--r-- 1 skoumal users 132281 Jun 24 18:06 08-15O010N_1-AN -rw-r--r-- 1 skoumal users 162485 Jun 24 18:06 08-15X020N_2-AN -rw-r--r-- 1 skoumal users 177612 Jun 24 18:06 08-15X041N_3-AN -rw-r--r-- 1 skoumal users 181697 Jun 24 18:06 08-16A005N_3-AN -rw-r--r-- 1 skoumal users 139212 Jun 24 18:06 08-16E005N_4-AN -rw-r--r-- 1 skoumal users 121915 Jun 24 18:06 08-16E007N_4-AN -rw-r--r-- 1 skoumal users 137518 Jun 24 18:06 08-16X003N_4-AN -rw-r--r-- 1 skoumal users 176342 Jun 24 18:06 08-16X026N_1-AN -rw-r--r-- 1 skoumal users 141328 Jun 24 18:06 08-16X031N_2-AN === Michal Havrda - MH === * (7117) -- hotovo: -rw-r--r-- 1 skoumal users 160033 Jun 24 18:06 10-13A005N_4-MH -rw-r--r-- 1 skoumal users 137114 Jun 24 18:06 10-13A011N_3-MH -rw-r--r-- 1 skoumal users 112974 Jun 24 18:06 10-13A018N_2-MH -rw-r--r-- 1 skoumal users 157987 Jun 24 18:06 10-13A074N_5-MH -rw-r--r-- 1 skoumal users 129620 Jun 24 18:06 10-13B016N_1-MH -rw-r--r-- 1 skoumal users 152048 Jun 24 18:06 10-13O003N_0-MH -rw-r--r-- 1 skoumal users 126483 Jun 24 18:06 10-13P009N_0-MH -rw-r--r-- 1 skoumal users 157568 Jun 24 18:06 10-14A011N_1-MH -rw-r--r-- 1 skoumal users 154505 Jun 24 18:06 10-14C009N_2-MH -rw-r--r-- 1 skoumal users 181761 Jun 24 18:06 10-14O007N_0-MH -rw-r--r-- 1 skoumal users 135186 Jun 24 18:06 10-14P006N_1-MH -rw-r--r-- 1 skoumal users 143875 Jun 24 18:06 10-15O011N_0-MH -rw-r--r-- 1 skoumal users 122224 Jun 24 18:06 10-15O012N_0-MH -rw-r--r-- 1 skoumal users 166988 Jun 24 18:06 10-15P004N_0-MH -rw-r--r-- 1 skoumal users 122777 Jun 24 18:06 10-15T002N_2-MH -rw-r--r-- 1 skoumal users 202198 Jun 24 18:06 10-15T003N_1-MH -rw-r--r-- 1 skoumal users 190825 Jun 24 18:06 10-15T011N_4-MH -rw-r--r-- 1 skoumal users 181748 Jun 24 18:06 10-15X020N_0-MH -rw-r--r-- 1 skoumal users 161548 Jun 24 18:06 10-16A002N_0-MH -rw-r--r-- 1 skoumal users 114890 Jun 24 18:06 10-16E007N_5-MH -rw-r--r-- 1 skoumal users 128041 Jun 24 18:06 10-16P004N_3-MH -rw-r--r-- 1 skoumal users 131846 Jun 24 18:06 10-16X003N_0-MH === Pavel Kopřiva === * (7328) -- hotovo: -rw-r--r-- 1 skoumal users 123564 Jun 24 18:06 09-12A004N_1-PK -rw-r--r-- 1 skoumal users 140637 Jun 24 18:06 09-12A034N_3-PK -rw-r--r-- 1 skoumal users 137184 Jun 24 18:06 09-12H004N_1-PK -rw-r--r-- 1 skoumal users 118561 Jun 24 18:06 09-13A003N_0-PK -rw-r--r-- 1 skoumal users 151930 Jun 24 18:06 09-13A074N_4-PK -rw-r--r-- 1 skoumal users 137469 Jun 24 18:06 09-13A090N_2-PK -rw-r--r-- 1 skoumal users 115312 Jun 24 18:06 09-13B011N_0-PK -rw-r--r-- 1 skoumal users 161409 Jun 24 18:06 09-13B027N_0-PK -rw-r--r-- 1 skoumal users 156513 Jun 24 18:06 09-13O007N_3-PK -rw-r--r-- 1 skoumal users 143758 Jun 24 18:06 09-13P008N_1-PK -rw-r--r-- 1 skoumal users 147253 Jun 24 18:06 09-13T029N_3-PK -rw-r--r-- 1 skoumal users 218778 Jun 24 18:06 09-13X003N_0-PK -rw-r--r-- 1 skoumal users 121754 Jun 24 18:06 09-14A016N_0-PK -rw-r--r-- 1 skoumal users 132630 Jun 24 18:06 09-14C006N_3-PK -rw-r--r-- 1 skoumal users 98129 Jun 24 18:06 09-14T024N_4-PK -rw-r--r-- 1 skoumal users 171839 Jun 24 18:06 09-14X016N_2-PK -rw-r--r-- 1 skoumal users 143302 Jun 24 18:06 09-15O004N_2-PK -rw-r--r-- 1 skoumal users 170311 Jun 24 18:06 09-15X041N_0-PK -rw-r--r-- 1 skoumal users 148747 Jun 24 18:06 09-15X044N_1-PK -rw-r--r-- 1 skoumal users 169442 Jun 24 18:06 09-16A002N_1-PK -rw-r--r-- 1 skoumal users 104915 Jun 24 18:06 09-16E007N_1-PK -rw-r--r-- 1 skoumal users 145808 Jun 24 18:06 09-16X030N_0-PK ==== Kontrola a převzetí textů ==== * Stejným způsobem jako při [[wiki:user:skoumal:anotace|Anotaci]] * Pracujeme na **jakobsonovi** * Nejdříve texty zkontrolujeme:cd /net/grimm/store/corp/ortofon-etalon/csts-import for ff in 04-15X030N_3-PK 04-15X043N_2-PK 04-16A009N_2-PK; \ do echo $ff; \ /usr/local/annotate/bin/csts-export.pl --verbose $ff > /dev/null; done * Je-li vše v pořádku, soubory uložíme. ==== Převod zpět do vertikály, opravy ==== * Vyrobíme adresář ''vert-export'' a převedeme soubory do něj: cd csts-export for ff in *; do echo $ff; oral-csts-vert.pl < $ff > ../vert-export/$ff; donezde jsou opraveny i entity, takže první sloupec by měl odpovídat originálu: cs ../chunks for ff in *; do echo $ff; sdiff -s <(cut -f1 $ff) <(cut -f1 ../vert-export/${ff%.vrt}-??); done * Opravíme ''invalid'' a ''X@'' ==== Problematické Horkého opravy -- dotazy na MK a DL ==== * Tokenizace: * dvěstě ---> dvě stě * v o ---> vo * od tamaď ---> odtamaď * třinácet ---> třináct set * takovýty ---> takový ty * tyjo ---> ty jo * napohodu ---> na pohodu * ježíš maria ---> ježíšmarja * v spára ---> V - spára (neměl by být přepis vé?) * ježíši maria ---> ježíšimarja * devatenácet ---> devatenáct set * osmnáctset ---> osmnáct set ===== Příprava dat pro automatickou anotaci (hybrid vs. MorphoDiTa) ===== * Data jsou na grimmovi v adresáři ''/store/corp/Ortofon''. * Pracuje se s vertikálou ze souborů v ''/store/corp/Ortofon/ortofon-etalon/Verze/2/1''. * Příprava společných dat z ''ortofon-etalon/davka-?'': cd ortofon-hybrid/davka-1 mkdir vert cd ../ortofon-etalon/davka-1/Verze/2/1 for ff in *; do echo $ff; cut -f1 $ff | perl -pe 's/^<.*>$//' \ | cat -s > ../../../../../ortofon-hybrid/davka-1/vert/$ff; done cd ../../../../../ortofon-hybrid/davka-1 make-corp.sh -s vert -t csts -v -p45 make-corp.sh -A1 -B1 -Eucs2 -M -p45 -s csts -t csts-morf -v ==== ortofon-hybrid ==== * Projede se celým naším hybridem a na závěr se upraví podle potřeb ortofonu. * Příprava dat: cd .../ortofon-hybrid/davka-? rsync -avz ../../ortofon-manual/davka-?/csts-morf . * Honzův skript ''processing_hybrid.pl'' (na vertikálu): make-corp.sh -s csts-morf -t vert-morf -p45 -v cd /usr/local/corp/Perl/Ortofon ./processing_hybrid.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-morf cd - cp -pr vert-morf vert-morf.ori cd vert-morf for ff in *.out; do mv $ff ${ff%.out}; done cd - mv csts-morf csts-morf.ori make-corp.sh -s vert-morf -t csts-morf -p45 -v * Pravidla až do konce: make-whole-corp-csts.sh -C1 -Eucs2 -f -M -p45 -trules -v * A ještě nějaké menší opravy (//já// --> //my// apod.) Tomášovým skriptem ''EtalonizaceVertikaly.pl'': make-corp.sh -s csts-rules-frazrl-rulh1-tag-vid-corr -t vert-rules-frazrl-rulh1-tag-vid-corr -p45 -v parallel-filter.sh -C "/usr/local/corp/Perl/EtalonizaceVertikaly.pl" \ -s vert-rules-frazrl-rulh1-tag-vid-corr -t vert-hybrid -p45 -v * Honzův skript ''postprocessing16.pl'' (na vertikálu): cd /usr/local/corp/Perl/Ortofon ./postprocessing16.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid cd /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid mkdir ../vert-hybrid-out for ff in *.post; do mv $ff ../vert-hybrid-out/${ff%.post}; done cd ../vert-hybrid-out for ff in *; do echo $ff; sed '1{/^$/d}' $ff > ../../../ortofon-automat/davka-2/vert-hybrid/$ff; done ==== ortofon-morphodita ==== * Pro MorphoDitu se připraví morfologie, která ale musí být v souladu s Etalonem a Davidovými skripty. * Příprava dat: cd .../ortofon-morphodita mkdir davka-? cd davka-? rsync -avz ../../ortofon-manual/davka-?/csts-morf . * Ovidování: make-corp.sh -s csts-morf -t csts-morf-vid -v -p45 * Opravy vidu, roznásobení proměnných, zjednodušení tagů a odstranění duplicit: parallel-filter.sh \ -C "corr-asp.pl | JH-wide-csts.sh | simplify-tags-csts-utf.pl | remove-dupl-csts-mark.pl X" \ -p45 -s csts-morf-vid -t csts-morf-vid-corr -v * Tomášův skript: make-corp.sh -s csts-morf-vid-corr -t vert-morf-vid-corr -p45 -v parallel-filter.sh -C EtalonizaceVertikaly.pl -s vert-morf-vid-corr -t vert-morf-vid-corr-etln -p45 -v * Honzův skript ''processing_mdita.pl'' (na vertikálu). Vzniknou soubory ''.out'': cd /usr/local/corp/Perl/Ortofon ./processing_mdita.pl /store/corp/Ortofon/ortofon-morphodita/davka-?/vert-morf-vid-corr-etln * Převedeme Honzův výstup na vstup pro MDiTu: cd /store/corp/Ortofon/ortofon-morphodita/davka-? mkdir vert-morphodita-in cd vert-morf-vid-corr-etln for ff in *.out; do echo $ff; sed '1{/^$/d}' $ff > ../vert-morphodita-in/${ff%.out}; done rm *.out * Spustíme MorphoDiTu a výsledek uložíme do ''/store/corp/Ortofon/ortofon-morphodita/vert-morphodita-out''. * Honzův skript ''postprocessing16.pl'' (na vertikálu): parallel-filter.sh -C "cut -f1-3 | perl -pe 's/(\t.*)\t/\$1 /'" -s vert-morphodita-out \ -t vert-morphodita-result -p45 -v cd /usr/local/corp/Perl/Ortofon ./postprocessing16.pl /store/corp/Ortofon/ortofon-morphodita/davka-2/vert-morphodita-result cd - * Umístíme do adresáře, kde se sleje MorphoDiTa s hybridem pro ruční anotaci: cd ../../ortofon-automat mkdir -p davka-2/vert-morphodita cd ../ortofon-morphodita/davka-2/vert-morphodita-result for ff in *.post; do mv $ff ../../../ortofon-automat/davka-2/vert-morphodita/${ff%.post}; done ==== Slití výsledků a příprava importu (ortofon-automat) ==== * Vše je v adresářích ''ortofon-automat/davka-?''. * V adresáři ''vert-hybrid'' jsou výsledky hybridu (viz výše). * V adresáři ''vert-morphodita'' jsou výsledky MorphoDiTy (viz výše). * Do adresáře ''vert-paste'' slijeme MorphoDiTu a hybrid: cd .../ortofon-automat/davka-? mkdir vert-paste cd vert-morphodita for ff in *; do paste $ff <(cut -f2- ../vert-hybrid/$ff) | perl -pe 's/^[\t\ ]+$//' > ../vert-paste/$ff; done * Převedeme do ''csts'': make-corp.sh -s vert-paste -t csts-paste -p45 -v * Odstraníme duplicity:parallel-filter.sh -C remove-dupl-csts.pl -p45 -s csts-paste -t csts-import -v ==== Anotátoři a přidělené soubory (davka-1) ==== === Pavel Kopřiva === * předplaceno 9.800, tj. 14.000 slovíček; 1. dávka 14.197 slov, **hotovo** -rw-rw-r-- 1 skoumal users 34407 May 23 14:41 01-12A005N_1-PK -rw-rw-r-- 1 skoumal users 46102 May 23 14:41 01-13A014N_1-PK -rw-rw-r-- 1 skoumal users 40307 May 23 14:41 01-13A028N_1-PK -rw-rw-r-- 1 skoumal users 41092 May 23 14:41 01-13B031N_1-PK -rw-rw-r-- 1 skoumal users 41144 May 23 14:41 01-13H005N_1-PK -rw-rw-r-- 1 skoumal users 44019 May 23 14:41 01-13O009N_3-PK -rw-rw-r-- 1 skoumal users 46809 May 23 14:41 01-13P004N_1-PK -rw-rw-r-- 1 skoumal users 43321 May 23 14:41 01-13P009N_2-PK -rw-rw-r-- 1 skoumal users 39320 May 23 14:41 01-14T003N_6-PK -rw-rw-r-- 1 skoumal users 44702 May 23 14:41 01-14T007N_1-PK -rw-rw-r-- 1 skoumal users 41621 May 23 14:41 01-14T010N_0-PK -rw-rw-r-- 1 skoumal users 44150 May 23 14:41 01-14X016N_6-PK -rw-rw-r-- 1 skoumal users 41916 May 23 14:41 01-14X019N_0-PK -rw-rw-r-- 1 skoumal users 44807 May 23 14:41 01-15O001N_0-PK -rw-rw-r-- 1 skoumal users 41851 May 23 14:41 01-15O004N_0-PK -rw-rw-r-- 1 skoumal users 44557 May 23 14:41 01-15O007N_1-PK -rw-rw-r-- 1 skoumal users 45038 May 23 14:41 01-15P001N_1-PK -rw-rw-r-- 1 skoumal users 50573 May 23 14:41 01-15T005N_0-PK -rw-rw-r-- 1 skoumal users 42191 May 23 14:41 01-15X045N_2-PK -rw-rw-r-- 1 skoumal users 42876 May 23 14:41 01-16A005N_2-PK -rw-rw-r-- 1 skoumal users 41182 May 23 14:41 01-16P002N_2-PK -rw-rw-r-- 1 skoumal users 37522 May 23 14:41 01-16X033N_5-PK -rw-rw-r-- 1 skoumal users 37307 May 23 14:41 02-12A011N_1-PK -rw-rw-r-- 1 skoumal users 43296 May 23 14:41 02-13A011N_5-PK -rw-rw-r-- 1 skoumal users 43113 May 23 14:41 02-13A036N_3-PK -rw-rw-r-- 1 skoumal users 42816 May 23 14:41 02-13B031N_0-PK -rw-rw-r-- 1 skoumal users 39465 May 23 14:41 02-13O009N_1-PK -rw-rw-r-- 1 skoumal users 45071 May 23 14:41 02-13O010N_2-PK -rw-rw-r-- 1 skoumal users 45410 May 23 14:41 02-13O013N_1-PK -rw-rw-r-- 1 skoumal users 41771 May 23 14:41 02-13P004N_3-PK -rw-rw-r-- 1 skoumal users 36768 May 23 14:41 02-13T029N_6-PK -rw-rw-r-- 1 skoumal users 39927 May 23 14:41 02-14A016N_2-PK -rw-rw-r-- 1 skoumal users 42561 May 23 14:41 02-14E003N_0-PK -rw-rw-r-- 1 skoumal users 44630 May 23 14:41 02-14T003N_5-PK -rw-rw-r-- 1 skoumal users 41947 May 23 14:41 02-14T013N_2-PK -rw-rw-r-- 1 skoumal users 51442 May 23 14:41 02-14T020N_3-PK -rw-rw-r-- 1 skoumal users 49831 May 23 14:41 02-15O002N_0-PK -rw-rw-r-- 1 skoumal users 40571 May 23 14:41 02-15O004N_1-PK -rw-rw-r-- 1 skoumal users 43099 May 23 14:41 02-15O009N_1-PK -rw-rw-r-- 1 skoumal users 41565 May 23 14:41 02-15P001N_3-PK -rw-rw-r-- 1 skoumal users 41993 May 23 14:41 02-15X041N_1-PK -rw-rw-r-- 1 skoumal users 41233 May 23 14:41 02-16A001N_0-PK -rw-rw-r-- 1 skoumal users 42439 May 23 14:41 02-16A005N_5-PK -rw-rw-r-- 1 skoumal users 46223 May 23 14:41 02-16P002N_0-PK -rw-rw-r-- 1 skoumal users 38555 May 23 14:41 02-16X003N_5-PK -rw-rw-r-- 1 skoumal users 42220 May 23 14:41 03-12A035N_3-PK -rw-rw-r-- 1 skoumal users 44055 May 23 14:41 03-13A014N_4-PK -rw-rw-r-- 1 skoumal users 40827 May 23 14:41 03-13O009N_2-PK -rw-rw-r-- 1 skoumal users 40803 May 23 14:41 03-13P010N_1-PK -rw-rw-r-- 1 skoumal users 46769 May 23 14:41 03-14A011N_3-PK -rw-rw-r-- 1 skoumal users 41318 May 23 14:41 03-14A016N_4-PK -rw-rw-r-- 1 skoumal users 44503 May 23 14:41 03-14P007N_2-PK -rw-rw-r-- 1 skoumal users 47542 May 23 14:41 03-14T010N_2-PK -rw-rw-r-- 1 skoumal users 47136 May 23 14:41 03-14T013N_0-PK -rw-rw-r-- 1 skoumal users 47617 May 23 14:41 03-14T020N_0-PK -rw-rw-r-- 1 skoumal users 43448 May 23 14:41 03-14X019N_4-PK -rw-rw-r-- 1 skoumal users 45235 May 23 14:41 03-14X021N_2-PK -rw-rw-r-- 1 skoumal users 43916 May 23 14:41 03-15E003N_0-PK -rw-rw-r-- 1 skoumal users 43217 May 23 14:41 03-15E015N_1-PK -rw-rw-r-- 1 skoumal users 42814 May 23 14:41 03-15O002N_1-PK -rw-rw-r-- 1 skoumal users 47330 May 23 14:41 03-15O007N_0-PK -rw-rw-r-- 1 skoumal users 47496 May 23 14:41 03-15X020N_1-PK -rw-rw-r-- 1 skoumal users 44998 May 23 14:41 03-15X041N_2-PK -rw-rw-r-- 1 skoumal users 47227 May 23 14:41 03-16A005N_4-PK -rw-rw-r-- 1 skoumal users 43259 May 23 14:41 03-16P004N_0-PK -rw-rw-r-- 1 skoumal users 47252 May 23 14:41 03-16X001N_1-PK -rw-rw-r-- 1 skoumal users 40593 May 23 14:41 03-16X031N_4-PK -rw-rw-r-- 1 skoumal users 40675 May 23 14:41 04-12A025N_0-PK -rw-rw-r-- 1 skoumal users 46531 May 23 14:41 04-12P004N_4-PK -rw-rw-r-- 1 skoumal users 45177 May 23 14:41 04-13B009N_0-PK -rw-rw-r-- 1 skoumal users 43618 May 23 14:41 04-13B019N_0-PK -rw-rw-r-- 1 skoumal users 44753 May 23 14:41 04-13B025N_0-PK -rw-rw-r-- 1 skoumal users 43003 May 23 14:41 04-13O007N_1-PK -rw-rw-r-- 1 skoumal users 44914 May 23 14:41 04-13O010N_0-PK -rw-rw-r-- 1 skoumal users 43164 May 23 14:41 04-13O013N_0-PK -rw-rw-r-- 1 skoumal users 44184 May 23 14:41 04-13P004N_2-PK -rw-rw-r-- 1 skoumal users 42275 May 23 14:41 04-14P007N_0-PK -rw-rw-r-- 1 skoumal users 49243 May 23 14:41 04-14T003N_0-PK -rw-rw-r-- 1 skoumal users 40458 May 23 14:41 04-14T010N_1-PK -rw-rw-r-- 1 skoumal users 42344 May 23 14:41 04-14T013N_1-PK -rw-rw-r-- 1 skoumal users 40829 May 23 14:41 04-14X016N_3-PK -rw-rw-r-- 1 skoumal users 40664 May 23 14:41 04-15O010N_0-PK -rw-rw-r-- 1 skoumal users 43180 May 23 14:41 04-15P001N_2-PK -rw-rw-r-- 1 skoumal users 40232 May 23 14:41 04-15P006N_0-PK -rw-rw-r-- 1 skoumal users 48556 May 23 14:41 05-14T010N_3-PK -rw-rw-r-- 1 skoumal users 44430 May 23 14:41 05-14T019N_0-PK -rw-rw-r-- 1 skoumal users 39720 May 23 14:41 05-14X012N_2-PK -rw-rw-r-- 1 skoumal users 49071 May 23 14:41 05-14X019N_2-PK -rw-rw-r-- 1 skoumal users 49252 May 23 14:41 05-14X019N_3-PK -rw-rw-r-- 1 skoumal users 44280 May 23 14:41 05-15O001N_1-PK -rw-rw-r-- 1 skoumal users 48997 May 23 14:41 05-15P001N_0-PK -rw-rw-r-- 1 skoumal users 49363 May 23 14:41 05-15X009N_1-PK -rw-rw-r-- 1 skoumal users 48507 May 23 14:41 05-15X020N_3-PK -rw-rw-r-- 1 skoumal users 38867 May 23 14:41 05-15X043N_5-PK -rw-rw-r-- 1 skoumal users 43559 May 23 14:41 05-16A005N_1-PK -rw-rw-r-- 1 skoumal users 44935 May 23 14:41 05-16E007N_0-PK -rw-rw-r-- 1 skoumal users 41154 May 23 14:41 05-16P002N_1-PK -rw-rw-r-- 1 skoumal users 44495 May 23 14:41 05-16P007N_1-PK -rw-rw-r-- 1 skoumal users 46831 May 23 14:41 05-16X001N_2-PK === Michal Havrda === * může celou dobu, 1. dávka 1.938 slov, **hotovo**, nevykázáno -rw-rw-r-- 1 skoumal users 42289 May 23 14:41 04-15X030N_3-MH -rw-rw-r-- 1 skoumal users 46559 May 23 14:41 04-15X043N_2-MH -rw-rw-r-- 1 skoumal users 45223 May 23 14:41 04-16A009N_2-MH -rw-rw-r-- 1 skoumal users 43922 May 23 14:41 04-16E007N_2-MH -rw-rw-r-- 1 skoumal users 48672 May 23 14:41 04-16P004N_1-MH -rw-rw-r-- 1 skoumal users 42605 May 23 14:41 04-16X003N_2-MH -rw-rw-r-- 1 skoumal users 41810 May 23 14:41 05-12P004N_2-MH -rw-rw-r-- 1 skoumal users 42430 May 23 14:41 05-13A011N_0-MH -rw-rw-r-- 1 skoumal users 46764 May 23 14:41 05-13A014N_3-MH -rw-rw-r-- 1 skoumal users 40185 May 23 14:41 05-13A023N_3-MH -rw-rw-r-- 1 skoumal users 41116 May 23 14:41 05-13B005N_1-MH -rw-rw-r-- 1 skoumal users 48129 May 23 14:41 05-13D015N_0-MH -rw-rw-r-- 1 skoumal users 44079 May 23 14:41 05-13O007N_0-MH -rw-rw-r-- 1 skoumal users 45164 May 23 14:41 05-14A011N_2-MH === Anna Nováková === * může až v červenci === Šárka Kadavá === * kdyby bylo nejhůř === Václav Horký === * jako pomvěd ==== Anotátoři a přidělené soubory (davka-2) ==== === Pavel Kopřiva === * 2. dávka 15.313 slov -rw-rw-r-- 1 skoumal users 42720 May 30 17:32 06-12A011N_0-PK -rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-12P004N_3-PK -rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-13A003N_1-PK -rw-rw-r-- 1 skoumal users 44326 May 30 17:32 06-13A014N_2-PK -rw-rw-r-- 1 skoumal users 45282 May 30 17:32 06-13A028N_2-PK -rw-rw-r-- 1 skoumal users 46555 May 30 17:32 06-13A074N_1-PK -rw-rw-r-- 1 skoumal users 43331 May 30 17:32 06-13B005N_0-PK -rw-rw-r-- 1 skoumal users 48089 May 30 17:32 06-13B028N_1-PK -rw-rw-r-- 1 skoumal users 40901 May 30 17:32 06-13O007N_2-PK -rw-rw-r-- 1 skoumal users 48638 May 30 17:32 06-14A006N_0-PK -rw-rw-r-- 1 skoumal users 40347 May 30 17:32 06-14A008N_3-PK -rw-rw-r-- 1 skoumal users 47879 May 30 17:32 06-14E001N_0-PK -rw-rw-r-- 1 skoumal users 44009 May 30 17:32 06-14P007N_3-PK -rw-rw-r-- 1 skoumal users 44757 May 30 17:32 06-14T007N_0-PK -rw-rw-r-- 1 skoumal users 45276 May 30 17:32 06-14T020N_1-PK -rw-rw-r-- 1 skoumal users 46545 May 30 17:32 06-14X016N_1-PK -rw-rw-r-- 1 skoumal users 42302 May 30 17:32 06-15E017N_5-PK -rw-rw-r-- 1 skoumal users 51075 May 30 17:32 06-15O010N_2-PK -rw-rw-r-- 1 skoumal users 41040 May 30 17:32 06-16A001N_3-PK -rw-rw-r-- 1 skoumal users 43143 May 30 17:32 06-16P002N_3-PK -rw-rw-r-- 1 skoumal users 35558 May 30 17:32 06-16P007N_5-PK -rw-rw-r-- 1 skoumal users 46698 May 30 17:32 06-16X003N_1-PK -rw-rw-r-- 1 skoumal users 45687 May 30 17:32 06-16X030N_1-PK -rw-rw-r-- 1 skoumal users 39012 May 30 17:32 07-12A037N_4-PK -rw-rw-r-- 1 skoumal users 44139 May 30 17:32 07-12O002N_0-PK -rw-rw-r-- 1 skoumal users 46794 May 30 17:32 07-13A003N_2-PK -rw-rw-r-- 1 skoumal users 46770 May 30 17:32 07-13A014N_0-PK -rw-rw-r-- 1 skoumal users 39791 May 30 17:32 07-13A028N_4-PK -rw-rw-r-- 1 skoumal users 43325 May 30 17:32 07-13A036N_5-PK -rw-rw-r-- 1 skoumal users 48700 May 30 17:32 07-13A050N_0-PK -rw-rw-r-- 1 skoumal users 42543 May 30 17:32 07-13E004N_6-PK -rw-rw-r-- 1 skoumal users 43506 May 30 17:32 07-13O004N_0-PK -rw-rw-r-- 1 skoumal users 44126 May 30 17:32 07-14T003N_4-PK -rw-rw-r-- 1 skoumal users 41497 May 30 17:32 07-14T007N_2-PK -rw-rw-r-- 1 skoumal users 45303 May 30 17:32 07-14T013N_3-PK -rw-rw-r-- 1 skoumal users 46002 May 30 17:32 07-14X016N_4-PK -rw-rw-r-- 1 skoumal users 42059 May 30 17:32 07-15C004N_0-PK -rw-rw-r-- 1 skoumal users 46510 May 30 17:32 07-15O009N_0-PK -rw-rw-r-- 1 skoumal users 41922 May 30 17:32 07-15P002N_0-PK -rw-rw-r-- 1 skoumal users 47748 May 30 17:32 07-16A005N_0-PK -rw-rw-r-- 1 skoumal users 40438 May 30 17:32 07-16A009N_3-PK -rw-rw-r-- 1 skoumal users 40615 May 30 17:32 07-16P007N_2-PK -rw-rw-r-- 1 skoumal users 45292 May 30 17:32 07-16X003N_3-PK -rw-rw-r-- 1 skoumal users 41981 May 30 17:32 07-16X031N_0-PK -rw-rw-r-- 1 skoumal users 44339 May 30 17:32 07-16X033N_3-PK -rw-rw-r-- 1 skoumal users 40608 May 30 17:32 08-12A009N_0-PK -rw-rw-r-- 1 skoumal users 45404 May 30 17:32 08-12A031N_0-PK -rw-rw-r-- 1 skoumal users 50613 May 30 17:32 08-13A018N_0-PK -rw-rw-r-- 1 skoumal users 43737 May 30 17:32 08-13A036N_0-PK -rw-rw-r-- 1 skoumal users 41190 May 30 17:32 08-13A090N_4-PK -rw-rw-r-- 1 skoumal users 43500 May 30 17:32 08-13B019N_1-PK -rw-rw-r-- 1 skoumal users 42045 May 30 17:32 08-13B028N_0-PK -rw-rw-r-- 1 skoumal users 39830 May 30 17:32 08-13O009N_0-PK -rw-rw-r-- 1 skoumal users 45677 May 30 17:32 08-13P004N_0-PK -rw-rw-r-- 1 skoumal users 43908 May 30 17:32 08-14C006N_0-PK -rw-rw-r-- 1 skoumal users 46106 May 30 17:32 08-14T003N_1-PK -rw-rw-r-- 1 skoumal users 42032 May 30 17:32 08-14T014N_4-PK -rw-rw-r-- 1 skoumal users 48395 May 30 17:32 08-14X016N_5-PK -rw-rw-r-- 1 skoumal users 41217 May 30 17:32 08-15E010N_5-PK -rw-rw-r-- 1 skoumal users 42667 May 30 17:32 08-15O010N_1-PK -rw-rw-r-- 1 skoumal users 46423 May 30 17:32 08-15X020N_2-PK -rw-rw-r-- 1 skoumal users 49698 May 30 17:32 08-15X041N_3-PK -rw-rw-r-- 1 skoumal users 50216 May 30 17:32 08-16A005N_3-PK -rw-rw-r-- 1 skoumal users 47850 May 30 17:32 08-16E005N_4-PK -rw-rw-r-- 1 skoumal users 43570 May 30 17:32 08-16E007N_4-PK -rw-rw-r-- 1 skoumal users 45400 May 30 17:32 08-16X003N_4-PK -rw-rw-r-- 1 skoumal users 44771 May 30 17:32 08-16X026N_1-PK -rw-rw-r-- 1 skoumal users 44471 May 30 17:32 08-16X031N_2-PK -rw-rw-r-- 1 skoumal users 40769 May 30 17:32 09-12A004N_1-PK -rw-rw-r-- 1 skoumal users 44435 May 30 17:32 09-12A034N_3-PK -rw-rw-r-- 1 skoumal users 45093 May 30 17:32 09-12H004N_1-PK -rw-rw-r-- 1 skoumal users 40828 May 30 17:32 09-13A003N_0-PK -rw-rw-r-- 1 skoumal users 48448 May 30 17:32 09-13A074N_4-PK -rw-rw-r-- 1 skoumal users 42901 May 30 17:32 09-13A090N_2-PK -rw-rw-r-- 1 skoumal users 43918 May 30 17:32 09-13B011N_0-PK -rw-rw-r-- 1 skoumal users 46136 May 30 17:32 09-13B027N_0-PK -rw-rw-r-- 1 skoumal users 43410 May 30 17:32 09-13O007N_3-PK -rw-rw-r-- 1 skoumal users 44953 May 30 17:32 09-13P008N_1-PK -rw-rw-r-- 1 skoumal users 44550 May 30 17:32 09-13T029N_3-PK -rw-rw-r-- 1 skoumal users 50559 May 30 17:32 09-13X003N_0-PK -rw-rw-r-- 1 skoumal users 42449 May 30 17:32 09-14A016N_0-PK -rw-rw-r-- 1 skoumal users 39140 May 30 17:32 09-14C006N_3-PK -rw-rw-r-- 1 skoumal users 36023 May 30 17:32 09-14T024N_4-PK -rw-rw-r-- 1 skoumal users 47500 May 30 17:32 09-14X016N_2-PK -rw-rw-r-- 1 skoumal users 46232 May 30 17:32 09-15O004N_2-PK -rw-rw-r-- 1 skoumal users 49245 May 30 17:32 09-15X041N_0-PK -rw-rw-r-- 1 skoumal users 41205 May 30 17:32 09-15X044N_1-PK -rw-rw-r-- 1 skoumal users 45831 May 30 17:32 09-16A002N_1-PK -rw-rw-r-- 1 skoumal users 42869 May 30 17:32 09-16E007N_1-PK -rw-rw-r-- 1 skoumal users 43627 May 30 17:32 09-16X030N_0-PK -rw-rw-r-- 1 skoumal users 44994 May 30 17:32 10-13A005N_4-PK -rw-rw-r-- 1 skoumal users 41142 May 30 17:32 10-13A011N_3-PK -rw-rw-r-- 1 skoumal users 41662 May 30 17:32 10-13A018N_2-PK -rw-rw-r-- 1 skoumal users 47470 May 30 17:32 10-13A074N_5-PK -rw-rw-r-- 1 skoumal users 39450 May 30 17:32 10-13B016N_1-PK -rw-rw-r-- 1 skoumal users 44065 May 30 17:32 10-13O003N_0-PK -rw-rw-r-- 1 skoumal users 43813 May 30 17:32 10-13P009N_0-PK -rw-rw-r-- 1 skoumal users 44046 May 30 17:32 10-14A011N_1-PK -rw-rw-r-- 1 skoumal users 46636 May 30 17:32 10-14C009N_2-PK -rw-rw-r-- 1 skoumal users 48973 May 30 17:32 10-14O007N_0-PK -rw-rw-r-- 1 skoumal users 40730 May 30 17:32 10-14P006N_1-PK -rw-rw-r-- 1 skoumal users 49089 May 30 17:32 10-15O011N_0-PK -rw-rw-r-- 1 skoumal users 43590 May 30 17:32 10-15O012N_0-PK -rw-rw-r-- 1 skoumal users 41094 May 30 17:32 10-15P004N_0-PK -rw-rw-r-- 1 skoumal users 39004 May 30 17:32 10-15T002N_2-PK -rw-rw-r-- 1 skoumal users 45379 May 30 17:32 10-15T003N_1-PK -rw-rw-r-- 1 skoumal users 46787 May 30 17:32 10-15T011N_4-PK -rw-rw-r-- 1 skoumal users 49274 May 30 17:32 10-15X020N_0-PK -rw-rw-r-- 1 skoumal users 43482 May 30 17:32 10-16A002N_0-PK -rw-rw-r-- 1 skoumal users 43649 May 30 17:32 10-16E007N_5-PK -rw-rw-r-- 1 skoumal users 46847 May 30 17:32 10-16P004N_3-PK -rw-rw-r-- 1 skoumal users 44270 May 30 17:32 10-16X003N_0-PK ===== Slití ruční a automatické anotace ===== ==== Příprava dat ==== * Pod adresářem ''[/net/grimm]/store/corp/Ortofon'' vytvoříme podadresář ''ortofon-merge'' a v něm ''davka-?/csts-import'' a ''davka-?/csts-merge''. * V každém adresáři ''csts-merge'' si připravíme soubory pro slití. * Z adresáře ''.../ortofon-automat/davka-?/csts-export'' zkopírujeme soubory a příponu převedeme na malá písmena. Při kopírování budeme rovnou vybírat unikátní tagy: parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \ -s ../../ortofon-automat/davka-?/csts-export -t csts-merge -v cd csts-merge for ff in *-PK; do gg=${ff%-PK}-pk; echo "$ff $gg"; mv $ff $gg; doneTohle provedeme pro každou příponu. * Z adresáře ''../../ortofon-manual/davka-?/csts-export'' zkopírujeme odpovídající ručně zpracované soubory: parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \ -s ../../ortofon-manual/davka-?/csts-export -t csts-merge -v * Soubory s velkými písmeny mají v mark-upu '''' a košaté ''''; obojí mark-up musí obsahovat stejné tagy: for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's/

//' $ff; done for ff in *.bak; do echo ${ff%.bak}; sdiff -s ${ff%.bak} $ff; done for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's::\n:' $ff; done for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's::\n:' $ff; done for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:()\n:$1:' $ff; done for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:\n():$1:' $ff; done for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:

\n\n::' $ff; done for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:

\n::' $ff; done * Musíme zkontrolovat, jestli jsou zarovnané: for ff in *-[A-Z][A-Z]; do echo $ff; paste $ff ${ff%-??}-[a-z][a-z] | grep ""; done |\ grep -vP "\t" | l a opravit podle originálních dat v ''.../Ortofon/ortofon-data/0?''. * Automatické soubory nemají vid. Po zarovnání vytvoříme adresář ''csts-tag'' a zkopírujeme do něj soubory ''*-[A-Z][A-Z]''. mkdir ../csts-tag cp -p *-[A-Z][A-Z] ../csts-tag Na ně provedeme vidování a opravy vidů: [frozen] make-asp.sh -Eucs2 -fcsts -p6 -s csts-tag -t csts-tag-vid -v cd /usr/local/corp/frozen-states/201910/corp/DisambiguacniSkripty/PostDisambVid-utf-csts/povinne parallel-filter.sh -C "./11_OpravitVid-1 | ./11_OpravitVid-2 | ./20_asp_stat.pl" \ -s /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid \ -t /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid-corr -v cd - for ff in *; do echo $ff; perl -i -pe 's/invalid-/invalid/' $ff; done * Soubory z ''csts-tag-vid-corr'' zkopírujeme zpátky do ''csts-merge''. * Sjednotíme mark-up: for ff in *; do echo $ff; perl -i.bak -pe 's/]+>//' $ff; done for ff in *; do echo $ff; perl -i.bak -pe 's/(]+>/$1>/g' $ff; done * Zkontrolujeme a opravíme obouvidá slovesa: grep "B$" * | cut -f3 -d'<' | sort -u > ../vidy.txtU dalších dávek porovnáme s předchozími for ff in $(cat vidy.txt); do echo $ff; grep -h "$ff<" ../davka-1/csts-import/* | sort -u; done | l A opravíme for ff in $(grep -l "bydlet...............B" *); do echo $ff; \ perl -i.bak -pe\'s/(bydlet...............)B/$1I/' $ff; done * Připravíme data pro import: for ff in *-[A-Z][A-Z]; do gg=${ff%-??}-[a-z][a-z]; suff=$(echo $gg| cut -f3 -d'-'); \ echo $ff-$suff; paste $ff ${ff%-??}-[a-z][a-z] | perl -pe 's/[^<]+X@--------------([^<]+[FM])/$1/' > ../csts-import/$ff-$suff; donea upravíme řádky s tagy ''F'', ''H'' a ''M'': for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/@@Z:--------------/@/' $ff; done for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/@@Z:--------------//' $ff; done for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/emmX@--------------//' $ff; done for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/hmmII--------------//' $ff; done * Provedeme import (na jakobsonovi): cd ../csts-import for ff in *; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done * Upravíme ''/usr/local/annotate/users'' * Určování vidů: * **dát** P -- **dát se** I * (**dokázat** P -- **dokázat** (umět) B) * (**dovést** P -- **dovést** (umět) B) * **hodit** P -- **hodit se** I * **jmenovat** B -- **jmenovat se** I * (**napovídat** P -- **napovídat** I) * (**orientovat** B -- **orientovat se** I) * **stát** I -- **stát se** P * **věnovat** P -- **věnovat se** I ==== Anotátoři a přidělené soubory (davka-1) ==== === Jan Henyš === * (6696): -rw-r--r-- 1 skoumal users 74626 Nov 4 14:57 01-12A005N_1-VH-pk -rw-r--r-- 1 skoumal users 81870 Nov 4 14:57 01-13A014N_1-VH-pk -rw-r--r-- 1 skoumal users 93895 Nov 4 14:57 01-13A028N_1-VH-pk -rw-r--r-- 1 skoumal users 127391 Nov 4 14:57 01-13B031N_1-VH-pk -rw-r--r-- 1 skoumal users 108674 Nov 4 14:57 01-13H005N_1-VH-pk -rw-r--r-- 1 skoumal users 139978 Nov 4 14:57 01-13O009N_3-VH-pk -rw-r--r-- 1 skoumal users 81235 Nov 4 14:57 01-13P004N_1-VH-pk -rw-r--r-- 1 skoumal users 106283 Nov 4 14:57 01-13P009N_2-VH-pk -rw-r--r-- 1 skoumal users 104777 Nov 4 14:57 01-14T003N_6-VH-pk -rw-r--r-- 1 skoumal users 114342 Nov 4 14:57 01-14T007N_1-VH-pk -rw-r--r-- 1 skoumal users 121376 Nov 4 14:57 01-14T010N_0-VH-pk -rw-r--r-- 1 skoumal users 122450 Nov 4 14:57 01-14X016N_6-VH-pk -rw-r--r-- 1 skoumal users 126982 Nov 4 14:57 01-14X019N_0-VH-pk -rw-r--r-- 1 skoumal users 130419 Nov 4 14:57 01-15O001N_0-VH-pk -rw-r--r-- 1 skoumal users 105776 Nov 4 14:57 01-15O004N_0-TM-pk -rw-r--r-- 1 skoumal users 111398 Nov 4 14:57 01-15O007N_1-TM-pk -rw-r--r-- 1 skoumal users 115392 Nov 4 14:57 01-15P001N_1-TM-pk -rw-r--r-- 1 skoumal users 154103 Nov 4 14:57 01-15T005N_0-TM-pk -rw-r--r-- 1 skoumal users 107637 Nov 4 14:57 01-15X045N_2-TM-pk -rw-r--r-- 1 skoumal users 125936 Nov 4 14:57 01-16A005N_2-TM-pk -rw-r--r-- 1 skoumal users 98450 Nov 4 14:57 01-16P002N_2-TM-pk -rw-r--r-- 1 skoumal users 104393 Nov 4 14:57 01-16X033N_5-TM-pk -rw-r--r-- 1 skoumal users 101802 Nov 4 14:57 02-12A011N_1-TM-pk -rw-r--r-- 1 skoumal users 122296 Nov 4 14:57 02-13A011N_5-TM-pk -rw-r--r-- 1 skoumal users 102029 Nov 4 14:57 02-13A036N_3-TM-pk -rw-r--r-- 1 skoumal users 149647 Nov 4 14:57 02-13B031N_0-TM-pk -rw-r--r-- 1 skoumal users 88170 Nov 4 14:57 02-13O009N_1-TM-pk -rw-r--r-- 1 skoumal users 97377 Nov 4 14:57 02-13O010N_2-TM-pk -rw-r--r-- 1 skoumal users 126451 Nov 4 14:57 02-13O013N_1-MZ-pk -rw-r--r-- 1 skoumal users 79652 Nov 4 14:57 02-13P004N_3-MZ-pk -rw-r--r-- 1 skoumal users 98536 Nov 4 14:57 02-13T029N_6-MZ-pk -rw-r--r-- 1 skoumal users 88034 Nov 4 14:57 02-14A016N_2-MZ-pk -rw-r--r-- 1 skoumal users 118994 Nov 4 14:57 02-14E003N_0-MZ-pk -rw-r--r-- 1 skoumal users 123841 Nov 4 14:57 02-14T003N_5-MZ-pk -rw-r--r-- 1 skoumal users 115630 Nov 4 14:57 02-14T013N_2-MZ-pk -rw-r--r-- 1 skoumal users 134818 Nov 4 14:57 02-14T020N_3-MZ-pk -rw-r--r-- 1 skoumal users 141686 Nov 4 14:57 02-15O002N_0-MZ-pk -rw-r--r-- 1 skoumal users 95958 Nov 4 14:57 02-15O004N_1-MZ-pk -rw-r--r-- 1 skoumal users 84948 Nov 4 14:57 02-15O009N_1-MZ-pk -rw-r--r-- 1 skoumal users 101862 Nov 4 14:57 02-15P001N_3-MZ-pk -rw-r--r-- 1 skoumal users 100946 Nov 4 14:57 02-15X041N_1-MZ-pk -rw-r--r-- 1 skoumal users 115018 Nov 4 14:57 02-16A001N_0-MZ-pk -rw-r--r-- 1 skoumal users 132001 Nov 4 14:57 02-16A005N_5-SK-pk -rw-r--r-- 1 skoumal users 99541 Nov 4 14:57 02-16P002N_0-SK-pk -rw-r--r-- 1 skoumal users 79580 Nov 4 14:57 02-16X003N_5-SK-pk -rw-r--r-- 1 skoumal users 109874 Nov 4 14:57 03-12A035N_3-SK-pk -rw-r--r-- 1 skoumal users 92330 Nov 4 14:57 03-13A014N_4-SK-pk -rw-r--r-- 1 skoumal users 101868 Nov 4 14:57 03-13O009N_2-SK-pk -rw-r--r-- 1 skoumal users 102143 Nov 4 14:57 03-13P010N_1-SK-pk -rw-r--r-- 1 skoumal users 119923 Nov 4 14:57 03-14A011N_3-SK-pk -rw-r--r-- 1 skoumal users 89549 Nov 4 14:57 03-14A016N_4-SK-pk -rw-r--r-- 1 skoumal users 102824 Nov 4 14:57 03-14P007N_2-SK-pk -rw-r--r-- 1 skoumal users 152089 Nov 4 14:57 03-14T010N_2-SK-pk -rw-r--r-- 1 skoumal users 127088 Nov 4 14:57 03-14T013N_0-SK-pk -rw-r--r-- 1 skoumal users 131223 Nov 4 14:57 03-14T020N_0-SK-pk -rw-r--r-- 1 skoumal users 125088 Nov 4 14:57 03-14X019N_4-SK-pk -rw-r--r-- 1 skoumal users 106008 Nov 4 14:57 03-14X021N_2-LK-pk === Václav Horký === * (7020) **hotovo**: -rw-r--r-- 1 skoumal users 156567 Nov 4 14:57 03-15E003N_0-LK-pk -rw-r--r-- 1 skoumal users 112165 Nov 4 14:57 03-15E015N_1-LK-pk -rw-r--r-- 1 skoumal users 100938 Nov 4 14:57 03-15O002N_1-LK-pk -rw-r--r-- 1 skoumal users 122762 Nov 4 14:57 03-15O007N_0-LK-pk -rw-r--r-- 1 skoumal users 143197 Nov 4 14:57 03-15X020N_1-LK-pk -rw-r--r-- 1 skoumal users 115593 Nov 4 14:57 03-15X041N_2-LK-pk -rw-r--r-- 1 skoumal users 135237 Nov 4 14:57 03-16A005N_4-LK-pk -rw-r--r-- 1 skoumal users 97000 Nov 4 14:57 03-16P004N_0-LK-pk -rw-r--r-- 1 skoumal users 145176 Nov 4 14:57 03-16X001N_1-LK-pk -rw-r--r-- 1 skoumal users 118990 Nov 4 14:57 03-16X031N_4-LK-pk -rw-r--r-- 1 skoumal users 83711 Nov 4 14:57 04-12A025N_0-LK-pk -rw-r--r-- 1 skoumal users 140071 Nov 4 14:57 04-12P004N_4-LK-pk -rw-r--r-- 1 skoumal users 152345 Nov 4 14:57 04-13B009N_0-LK-pk -rw-r--r-- 1 skoumal users 126964 Nov 4 14:57 04-13B019N_0-MH-pk -rw-r--r-- 1 skoumal users 116116 Nov 4 14:57 04-13B025N_0-MH-pk -rw-r--r-- 1 skoumal users 102177 Nov 4 14:57 04-13O007N_1-MH-pk -rw-r--r-- 1 skoumal users 101580 Nov 4 14:57 04-13O010N_0-MH-pk -rw-r--r-- 1 skoumal users 139029 Nov 4 14:57 04-13O013N_0-MH-pk -rw-r--r-- 1 skoumal users 83185 Nov 4 14:57 04-13P004N_2-MH-pk -rw-r--r-- 1 skoumal users 101357 Nov 4 14:57 04-14P007N_0-MH-pk -rw-r--r-- 1 skoumal users 136504 Nov 4 14:57 04-14T003N_0-MH-pk -rw-r--r-- 1 skoumal users 125338 Nov 4 14:57 04-14T010N_1-MH-pk -rw-r--r-- 1 skoumal users 105675 Nov 4 14:57 04-14T013N_1-MH-pk -rw-r--r-- 1 skoumal users 106963 Nov 4 14:57 04-14X016N_3-MH-pk -rw-r--r-- 1 skoumal users 92255 Nov 4 14:57 04-15O010N_0-MH-pk -rw-r--r-- 1 skoumal users 111061 Nov 4 14:57 04-15P001N_2-MH-pk -rw-r--r-- 1 skoumal users 90023 Nov 4 14:57 04-15P006N_0-MH-pk -rw-r--r-- 1 skoumal users 102256 Nov 4 14:57 04-15X030N_3-PK-mh -rw-r--r-- 1 skoumal users 96979 Nov 4 14:57 04-15X043N_2-PK-mh -rw-r--r-- 1 skoumal users 125828 Nov 4 14:57 04-16A009N_2-PK-mh -rw-r--r-- 1 skoumal users 85615 Nov 4 14:57 04-16E007N_2-PK-mh -rw-r--r-- 1 skoumal users 113460 Nov 4 14:57 04-16P004N_1-PK-mh -rw-r--r-- 1 skoumal users 103271 Nov 4 14:57 04-16X003N_2-PK-mh -rw-r--r-- 1 skoumal users 114602 Nov 4 14:57 05-12P004N_2-PK-mh -rw-r--r-- 1 skoumal users 122736 Nov 4 14:57 05-13A011N_0-PK-mh -rw-r--r-- 1 skoumal users 96925 Nov 4 14:57 05-13A014N_3-PK-mh -rw-r--r-- 1 skoumal users 80363 Nov 4 14:57 05-13A023N_3-PK-mh -rw-r--r-- 1 skoumal users 108453 Nov 4 14:57 05-13B005N_1-PK-mh -rw-r--r-- 1 skoumal users 139140 Nov 4 14:57 05-13D015N_0-PK-mh -rw-r--r-- 1 skoumal users 104929 Nov 4 14:57 05-13O007N_0-PK-mh -rw-r--r-- 1 skoumal users 125205 Nov 4 14:57 05-14A011N_2-PK-mh -rw-r--r-- 1 skoumal users 147679 Nov 4 14:57 05-14T010N_3-AN-pk -rw-r--r-- 1 skoumal users 90556 Nov 4 14:57 05-14T019N_0-AN-pk -rw-r--r-- 1 skoumal users 117725 Nov 4 14:57 05-14X012N_2-AN-pk -rw-r--r-- 1 skoumal users 146139 Nov 4 14:57 05-14X019N_2-AN-pk -rw-r--r-- 1 skoumal users 148811 Nov 4 14:57 05-14X019N_3-AN-pk -rw-r--r-- 1 skoumal users 121542 Nov 4 14:57 05-15O001N_1-AN-pk -rw-r--r-- 1 skoumal users 117876 Nov 4 14:57 05-15P001N_0-AN-pk -rw-r--r-- 1 skoumal users 149222 Nov 4 14:57 05-15X009N_1-AN-pk -rw-r--r-- 1 skoumal users 141576 Nov 4 14:57 05-15X020N_3-AN-pk -rw-r--r-- 1 skoumal users 69482 Nov 4 14:57 05-15X043N_5-AN-pk -rw-r--r-- 1 skoumal users 149294 Nov 4 14:57 05-16A005N_1-AN-pk -rw-r--r-- 1 skoumal users 85369 Nov 4 14:57 05-16E007N_0-AN-pk -rw-r--r-- 1 skoumal users 95205 Nov 4 14:57 05-16P002N_1-AN-pk -rw-r--r-- 1 skoumal users 95081 Nov 4 14:57 05-16P007N_1-AN-pk -rw-r--r-- 1 skoumal users 133102 Nov 4 14:57 05-16X001N_2-MH-pk ==== Anotátoři a přidělené soubory (davka-2) ==== === Jan Henyš === * (): === Václav Horký === * (4675) **hotovo**: -rw-r--r-- 1 skoumal staff 151492 Dec 5 15:12 08-16A005N_3-AN-pk -rw-r--r-- 1 skoumal staff 100204 Dec 5 15:12 08-16E005N_4-AN-pk -rw-r--r-- 1 skoumal staff 93187 Dec 5 15:12 08-16E007N_4-AN-pk -rw-r--r-- 1 skoumal staff 109275 Dec 5 15:12 08-16X003N_4-AN-pk -rw-r--r-- 1 skoumal staff 131994 Dec 5 15:12 08-16X026N_1-AN-pk -rw-r--r-- 1 skoumal staff 111495 Dec 5 15:12 08-16X031N_2-AN-pk -rw-r--r-- 1 skoumal staff 92992 Dec 5 15:12 09-12A004N_1-PK-pk -rw-r--r-- 1 skoumal staff 115075 Dec 5 15:12 09-12A034N_3-PK-pk -rw-r--r-- 1 skoumal staff 109999 Dec 5 15:12 09-12H004N_1-PK-pk -rw-r--r-- 1 skoumal staff 91469 Dec 5 15:12 09-13A003N_0-PK-pk -rw-r--r-- 1 skoumal staff 112768 Dec 5 15:12 09-13A074N_4-PK-pk -rw-r--r-- 1 skoumal staff 112259 Dec 5 15:12 09-13A090N_2-PK-pk -rw-r--r-- 1 skoumal staff 87195 Dec 5 15:12 09-13B011N_0-PK-pk -rw-r--r-- 1 skoumal staff 129299 Dec 5 15:12 09-13B027N_0-PK-pk -rw-r--r-- 1 skoumal staff 122598 Dec 5 15:12 09-13O007N_3-PK-pk -rw-r--r-- 1 skoumal staff 110178 Dec 5 15:12 09-13P008N_1-PK-pk -rw-r--r-- 1 skoumal staff 120714 Dec 5 15:12 09-13T029N_3-PK-pk -rw-r--r-- 1 skoumal staff 185624 Dec 5 15:12 09-13X003N_0-PK-pk -rw-r--r-- 1 skoumal staff 90067 Dec 5 15:12 09-14A016N_0-PK-pk -rw-r--r-- 1 skoumal staff 103615 Dec 5 15:12 09-14C006N_3-PK-pk -rw-r--r-- 1 skoumal staff 73062 Dec 5 15:12 09-14T024N_4-PK-pk -rw-r--r-- 1 skoumal staff 128943 Dec 5 15:12 09-14X016N_2-PK-pk -rw-r--r-- 1 skoumal staff 108095 Dec 5 15:12 09-15O004N_2-PK-pk -rw-r--r-- 1 skoumal staff 126265 Dec 5 15:12 09-15X041N_0-PK-pk -rw-r--r-- 1 skoumal staff 119361 Dec 5 15:12 09-15X044N_1-PK-pk -rw-r--r-- 1 skoumal staff 142155 Dec 5 15:12 09-16A002N_1-PK-pk -rw-r--r-- 1 skoumal staff 75463 Dec 5 15:12 09-16E007N_1-PK-pk -rw-r--r-- 1 skoumal staff 120311 Dec 5 15:12 09-16X030N_0-PK-pk -rw-r--r-- 1 skoumal staff 135473 Dec 5 15:12 10-13A005N_4-MH-pk -rw-r--r-- 1 skoumal staff 113915 Dec 5 15:12 10-13A011N_3-MH-pk -rw-r--r-- 1 skoumal staff 87897 Dec 5 15:12 10-13A018N_2-MH-pk -rw-r--r-- 1 skoumal staff 121284 Dec 5 15:12 10-13A074N_5-MH-pk -rw-r--r-- 1 skoumal staff 103486 Dec 5 15:12 10-13B016N_1-MH-pk -rw-r--r-- 1 skoumal staff 127402 Dec 5 15:12 10-13O003N_0-MH-pk -rw-r--r-- 1 skoumal staff 99716 Dec 5 15:12 10-13P009N_0-MH-pk -rw-r--r-- 1 skoumal staff 126373 Dec 5 15:12 10-14A011N_1-MH-pk -rw-r--r-- 1 skoumal staff 132420 Dec 5 15:12 10-14C009N_2-MH-pk -rw-r--r-- 1 skoumal staff 154287 Dec 5 15:12 10-14O007N_0-MH-pk -rw-r--r-- 1 skoumal staff 108656 Dec 5 15:12 10-14P006N_1-MH-pk -rw-r--r-- 1 skoumal staff 111987 Dec 5 15:12 10-15O011N_0-MH-pk -rw-r--r-- 1 skoumal staff 98207 Dec 5 15:12 10-15O012N_0-MH-pk -rw-r--r-- 1 skoumal staff 139761 Dec 5 15:12 10-15P004N_0-MH-pk -rw-r--r-- 1 skoumal staff 94261 Dec 5 15:12 10-15T002N_2-MH-pk -rw-r--r-- 1 skoumal staff 179656 Dec 5 15:12 10-15T003N_1-MH-pk -rw-r--r-- 1 skoumal staff 158985 Dec 5 15:12 10-15T011N_4-MH-pk -rw-r--r-- 1 skoumal staff 147464 Dec 5 15:12 10-15X020N_0-MH-pk -rw-r--r-- 1 skoumal staff 133827 Dec 5 15:12 10-16A002N_0-MH-pk -rw-r--r-- 1 skoumal staff 87534 Dec 5 15:12 10-16E007N_5-MH-pk -rw-r--r-- 1 skoumal staff 96142 Dec 5 15:12 10-16P004N_3-MH-pk -rw-r--r-- 1 skoumal staff 103483 Dec 5 15:12 10-16X003N_0-MH-pk ==== Výroba vertikály s mark-upem ==== * Ručně anotované soubory jsou v adresáři ''csts-export'' * Do vertikály je převedeme skriptem ''ortofon-csts-vert.pl'': parallel-filter.sh -C "ortofon-csts-vert.pl" -p45 -s csts-export -t vert-export -v * Pro jistotu zkopírujeme vše do ''vert-opravy'' a opravy provádíme tam. ==== Kontrola a ruční opravy vertikály ==== === Automatické opravy === * Forma ''von.*'' vs lemma ''on.*'':grep -P "von.*\ton" * * Varianta ''6'' u lemmat ''von.*'': grep -P "von[^\t]*\tPP.*6" * * Vid u příklonky ''s'' (forma //#s//) * Sjednotit ''každý'' * Zkratky * Hesitační zvuky (//hmm//, //@//, //@@//) === Ruční opravy === * ''invalid'' * ''X@'' * Vizuální kontrola tagů:grep -h -v "^<" * | cut -f3 | sort -u | l * Kontrola správnosti tagů: grep -h -v "^<" * | cut -f3 | sort -u | check-tag.pl -l16 > /dev/null * Kontrola //hvězdiček// apod. ===== Porovnání lemmat a POS od nás vs. MorphoDita (pro studentku Dominiku) ===== * Vytvoříme adresář ''merge-csts'', kde budeme připravovat texty pro anotaci. ==== Převod chunků do csts ==== * Je třeba z vertikály udělat '''' s tagy '''': cd chunks for ff in *; do echo $ff; oral-vert-csts.pl < $ff > ../merge-csts/${ff%.vrt}.chunk.csts; done ==== Porovnání našich pravidel s chunky ==== * Provedeme pomocí diffu: sdiff 05-16X001N_2.chunk.csts <(grep -v '' ../csts-import/05-16X001N_2.vrt | perl -pe 's/(.)[^<\n]+/$1/g' \ | remove-dupl-csts-mark.pl | perl -pe 's/]*>//') | l ==== Převod csts-rules-frazrl do společného formátu ==== * Převedeme takto:cd csts-import for ff in *.vrt; do echo $ff; grep -v '' $ff | perl -pe 's/(.)[^<\n]+/$1/g' \ | perl -pe 's/&dhellip;/../g' | perl -pe 's/&thellip;/.../g' | perl -pe 's/(\*)X/$1F/' \ | perl -pe 's/\@+()Z/\@$1H/' | perl -pe 's/([eh]mm)[IX]/$1H/' | perl -pe 's/]+>([eh]mm<|\@+)/$1/' \ | perl -pe 's/(\))X/$1M/' | perl -pe 's/(\&)Z/$1H/' | remove-dupl-csts-mark.pl | perl -pe 's/]*>//' \ > ../merge-csts/${ff%.vrt}.import.csts; done ==== Slití chunk a import do merge-import ==== * Vyrobíme data pro anotaci:mkdir -p merge-import cd merge-csts for ff in *.chunk.csts; do echo $ff; sdiff -w 2500 ${ff%.chunk.csts}.import.csts $ff \ | perl -pe 's/[\ \t]+\|[\ \t]+[^<]+//' | perl -pe 's/[\ \t]+<.*//' | remove-dupl-csts-mark.pl Q \ > ../merge-import/${ff%.chunk.csts}.csts; done * zkontrolujeme tabulátory * a potom naimportujeme do anotačního programu (na jakobsonovi): cd ../merge-import for ff in *-Dom; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done