tak
taky
můžeš
jít
na
ryby
===== Postup práce =====
* Každý text je anotován **jedním** lidským anotátorem a slitými výsledky **hybridu** vs **MorphoDiTa**.
* Máme deset adresářů s texty od Davida: ''/store/corp/Ortofon/ortofon-data/01--05'' (první dávka) a ''/store/corp/Ortofon/ortofon-data/06--10'' (druhá dávka).
* Ruční anotace se provádí v adresářích ''/store/corp/Ortofon/ortofon-manual/davka-?''.
* Automatická anotace se provádí v adresářích ''/store/corp/Ortofon/ortofon-hybrid/davka-?'' a ''/store/corp/Ortofon/ortofon-morphodita/davka?''.
* Závěrečné slití ruční a automatické anotace se provede v adresáři ''/store/corp/Ortofon/ortofon-etalon/davka-?''.
===== Ruční anotace =====
==== Příprava textů pro anotátory ====
* Všechny soubory dáme do adresáře ''chunks''
* Vyrobíme ''csts'':
parallel-filter.sh -C "cut -f1 | perl -pe 's/\.\.\./&thellip;/g' \
| perl -pe 's/\.\./&dhellip;/g' | replace_spaces.pl \
| perl -pe 's/::' \
| vert_csts.pl | perl -pe 'undef $/; s/\n
* **Tagujeme hybridem!**
* Provedeme morfologii:
make-corp.sh -s csts -t csts-morf -Eucs2 -A1 -B1 -M -p45 -v
* Rozhodneme //vole// a //von//:
cd csts-morf
for ff in *; do echo $ff; \
perl -i -pe 's/vole<.*/volevůlNNMS5-----A----/' $ff; done
for ff in *; do echo $ff; \
perl -i -pe 's/(]*>von)<.*/$1onPPYS1--3------6/' $ff; done
* Provedeme pravidla a frazémy:
make-whole-corp-csts.sh -Eucs2 -M -v -p45 -trules -Tfrazrl
* Upravíme tagy:parallel-filter.sh -C "normalize-anot-csts.pl \
| simplify-tags-csts-utf.pl| remove-dupl-csts-mark.pl X" -p45 \
-s csts-rules-frazrl -t csts-import -v
A ještě zjednodušit tagy a ošetřit zvuky v pozadí.
* Naděláme linky pro jednotlivé anotátory.
* Další kroky provedeme **na jakobsonovi**.
* Import souboru:/usr/local/annotate/bin/csts-import-utkl.pl --force 05-16X001N_2-HS
* Upravit ''/usr/local/annotate/users''.
==== Anotátoři a přidělené soubory (davka-1) ====
=== Michal Havrda - MH ===
* xaf (4541): **hotovo**
-rw-rw-r-- 1 skoumal users 169887 Nov 7 17:40 04-13B019N_0-MH *
-rw-rw-r-- 1 skoumal users 147849 Nov 7 17:40 04-13B025N_0-MH *
-rw-rw-r-- 1 skoumal users 136246 Nov 7 17:40 04-13O007N_1-MH *
-rw-rw-r-- 1 skoumal users 133560 Nov 7 17:40 04-13O010N_0-MH *
-rw-rw-r-- 1 skoumal users 165228 Nov 7 17:40 04-13O013N_0-MH *
-rw-rw-r-- 1 skoumal users 119827 Nov 7 17:40 04-13P004N_2-MH **
-rw-rw-r-- 1 skoumal users 129590 Nov 7 17:40 04-14P007N_0-MH *
-rw-rw-r-- 1 skoumal users 166905 Nov 7 17:40 04-14T003N_0-MH *
-rw-rw-r-- 1 skoumal users 152270 Nov 7 17:40 04-14T010N_1-MH *
-rw-rw-r-- 1 skoumal users 141300 Nov 7 17:40 04-14T013N_1-MH *
-rw-rw-r-- 1 skoumal users 148744 Nov 7 17:40 04-14X016N_3-MH *
-rw-rw-r-- 1 skoumal users 131885 Nov 7 17:40 04-15O010N_0-MH *
-rw-rw-r-- 1 skoumal users 153940 Nov 7 17:40 04-15P001N_2-MH *
-rw-rw-r-- 1 skoumal users 119987 Nov 7 17:40 04-15P006N_0-MH **
-rw-rw-r-- 1 skoumal users 163310 Nov 7 17:40 05-16X001N_2-MH **
=== Václav Horký - VH ===
* xaa (4238):
-rw-rw-r-- 1 skoumal users 97843 Nov 7 17:40 01-12A005N_1-VH
-rw-rw-r-- 1 skoumal users 124148 Nov 7 17:40 01-13A014N_1-VH
-rw-rw-r-- 1 skoumal users 139676 Nov 7 17:40 01-13A028N_1-VH
-rw-rw-r-- 1 skoumal users 162599 Nov 7 17:40 01-13B031N_1-VH
-rw-rw-r-- 1 skoumal users 128254 Nov 7 17:40 01-13H005N_1-VH
-rw-rw-r-- 1 skoumal users 176088 Nov 7 17:40 01-13O009N_3-VH
-rw-rw-r-- 1 skoumal users 128760 Nov 7 17:40 01-13P004N_1-VH
-rw-rw-r-- 1 skoumal users 139368 Nov 7 17:40 01-13P009N_2-VH
-rw-rw-r-- 1 skoumal users 135085 Nov 7 17:40 01-14T003N_6-VH
-rw-rw-r-- 1 skoumal users 152015 Nov 7 17:40 01-14T007N_1-VH
-rw-rw-r-- 1 skoumal users 153372 Nov 7 17:40 01-14T010N_0-VH
-rw-rw-r-- 1 skoumal users 166194 Nov 7 17:40 01-14X016N_6-VH
-rw-rw-r-- 1 skoumal users 168697 Nov 7 17:40 01-14X019N_0-VH
-rw-rw-r-- 1 skoumal users 160336 Nov 7 17:40 01-15O001N_0-VH
=== Šárka Kadavá - SK ===
* xad (4521): **hotovo**
-rw-rw-r-- 1 skoumal users 173330 Nov 7 17:40 02-16A005N_5-SK
-rw-rw-r-- 1 skoumal users 149878 Nov 7 17:40 02-16P002N_0-SK
-rw-rw-r-- 1 skoumal users 118822 Nov 7 17:40 02-16X003N_5-SK
-rw-rw-r-- 1 skoumal users 138030 Nov 7 17:40 03-12A035N_3-SK
-rw-rw-r-- 1 skoumal users 130944 Nov 7 17:40 03-13A014N_4-SK
-rw-rw-r-- 1 skoumal users 138479 Nov 7 17:40 03-13O009N_2-SK
-rw-rw-r-- 1 skoumal users 132373 Nov 7 17:40 03-13P010N_1-SK
-rw-rw-r-- 1 skoumal users 159964 Nov 7 17:40 03-14A011N_3-SK
-rw-rw-r-- 1 skoumal users 122871 Nov 7 17:40 03-14A016N_4-SK
-rw-rw-r-- 1 skoumal users 137852 Nov 7 17:40 03-14P007N_2-SK
-rw-rw-r-- 1 skoumal users 186207 Nov 7 17:40 03-14T010N_2-SK
-rw-rw-r-- 1 skoumal users 151313 Nov 7 17:40 03-14T013N_0-SK
-rw-rw-r-- 1 skoumal users 162548 Nov 7 17:40 03-14T020N_0-SK
-rw-rw-r-- 1 skoumal users 172522 Nov 7 17:40 03-14X019N_4-SK
=== Pavel Kopřiva - PK ===
* xag (4601): **hotovo**
-rw-rw-r-- 1 skoumal users 128575 Nov 7 17:40 04-15X030N_3-PK
-rw-rw-r-- 1 skoumal users 152738 Nov 7 17:40 04-15X043N_2-PK
-rw-rw-r-- 1 skoumal users 162024 Nov 7 17:40 04-16A009N_2-PK
-rw-rw-r-- 1 skoumal users 114272 Nov 7 17:40 04-16E007N_2-PK
-rw-rw-r-- 1 skoumal users 161129 Nov 7 17:40 04-16P004N_1-PK
-rw-rw-r-- 1 skoumal users 145303 Nov 7 17:40 04-16X003N_2-PK
-rw-rw-r-- 1 skoumal users 153704 Nov 7 17:40 05-12P004N_2-PK
-rw-rw-r-- 1 skoumal users 141893 Nov 7 17:40 05-13A011N_0-PK
-rw-rw-r-- 1 skoumal users 136846 Nov 7 17:40 05-13A014N_3-PK
-rw-rw-r-- 1 skoumal users 117562 Nov 7 17:40 05-13A023N_3-PK
-rw-rw-r-- 1 skoumal users 145722 Nov 7 17:40 05-13B005N_1-PK
-rw-rw-r-- 1 skoumal users 169153 Nov 7 17:40 05-13D015N_0-PK
-rw-rw-r-- 1 skoumal users 141199 Nov 7 17:40 05-13O007N_0-PK
-rw-rw-r-- 1 skoumal users 172005 Nov 7 17:40 05-14A011N_2-PK
=== Lucie Onari Kreslová - LK ===
* xae (4451): **hotovo**
-rw-rw-r-- 1 skoumal users 136241 Nov 7 17:40 03-14X021N_2-LK
-rw-rw-r-- 1 skoumal users 180763 Nov 7 17:40 03-15E003N_0-LK
-rw-rw-r-- 1 skoumal users 144355 Nov 7 17:40 03-15E015N_1-LK
-rw-rw-r-- 1 skoumal users 135982 Nov 7 17:40 03-15O002N_1-LK
-rw-rw-r-- 1 skoumal users 171807 Nov 7 17:40 03-15O007N_0-LK
-rw-rw-r-- 1 skoumal users 175211 Nov 7 17:40 03-15X020N_1-LK
-rw-rw-r-- 1 skoumal users 158941 Nov 7 17:40 03-15X041N_2-LK
-rw-rw-r-- 1 skoumal users 170958 Nov 7 17:40 03-16A005N_4-LK
-rw-rw-r-- 1 skoumal users 140487 Nov 7 17:40 03-16P004N_0-LK
-rw-rw-r-- 1 skoumal users 171624 Nov 7 17:40 03-16X001N_1-LK
-rw-rw-r-- 1 skoumal users 148677 Nov 7 17:40 03-16X031N_4-LK
-rw-rw-r-- 1 skoumal users 115839 Nov 7 17:40 04-12A025N_0-LK
-rw-rw-r-- 1 skoumal users 172208 Nov 7 17:40 04-12P004N_4-LK
-rw-rw-r-- 1 skoumal users 200543 Nov 7 17:40 04-13B009N_0-LK
=== Tereza Marková - TM ===
* xab (4269): - **hotovo**
-rw-rw-r-- 1 skoumal users 144719 Nov 7 17:40 01-15O004N_0-TM *
-rw-rw-r-- 1 skoumal users 147429 Nov 7 17:40 01-15O007N_1-TM *
-rw-rw-r-- 1 skoumal users 156462 Nov 7 17:40 01-15P001N_1-TM **
-rw-rw-r-- 1 skoumal users 188348 Nov 7 17:40 01-15T005N_0-TM **
-rw-rw-r-- 1 skoumal users 149779 Nov 7 17:40 01-15X045N_2-TM *
-rw-rw-r-- 1 skoumal users 165698 Nov 7 17:40 01-16A005N_2-TM *
-rw-rw-r-- 1 skoumal users 132634 Nov 7 17:40 01-16P002N_2-TM *
-rw-rw-r-- 1 skoumal users 121098 Nov 7 17:40 01-16X033N_5-TM *
-rw-rw-r-- 1 skoumal users 122287 Nov 7 17:40 02-12A011N_1-TM *
-rw-rw-r-- 1 skoumal users 146221 Nov 7 17:40 02-13A011N_5-TM *
-rw-rw-r-- 1 skoumal users 136882 Nov 7 17:40 02-13A036N_3-TM **
-rw-rw-r-- 1 skoumal users 198356 Nov 7 17:40 02-13B031N_0-TM *
-rw-rw-r-- 1 skoumal users 112397 Nov 7 17:40 02-13O009N_1-TM **
-rw-rw-r-- 1 skoumal users 126312 Nov 7 17:40 02-13O010N_2-TM **
=== Anna Nováková - AN ===
* xah (4552):
-rw-rw-r-- 1 skoumal users 190480 Nov 7 17:40 05-14T010N_3-AN
-rw-rw-r-- 1 skoumal users 122224 Nov 7 17:40 05-14T019N_0-AN
-rw-rw-r-- 1 skoumal users 154255 Nov 7 17:40 05-14X012N_2-AN
-rw-rw-r-- 1 skoumal users 189324 Nov 7 17:40 05-14X019N_2-AN
-rw-rw-r-- 1 skoumal users 183678 Nov 7 17:40 05-14X019N_3-AN
-rw-rw-r-- 1 skoumal users 154400 Nov 7 17:40 05-15O001N_1-AN
-rw-rw-r-- 1 skoumal users 150413 Nov 7 17:40 05-15P001N_0-AN
-rw-rw-r-- 1 skoumal users 206423 Nov 7 17:40 05-15X009N_1-AN
-rw-rw-r-- 1 skoumal users 181125 Nov 7 17:40 05-15X020N_3-AN
-rw-rw-r-- 1 skoumal users 99237 Nov 7 17:40 05-15X043N_5-AN
-rw-rw-r-- 1 skoumal users 182076 Nov 7 17:40 05-16A005N_1-AN
-rw-rw-r-- 1 skoumal users 122299 Nov 7 17:40 05-16E007N_0-AN
-rw-rw-r-- 1 skoumal users 120653 Nov 7 17:40 05-16P002N_1-AN
-rw-rw-r-- 1 skoumal users 131582 Nov 7 17:40 05-16P007N_1-AN
=== Michal Zlatkovský - MZ ===
* xac (4393):
-rw-rw-r-- 1 skoumal users 160624 Nov 7 17:40 02-13O013N_1-MZ
-rw-rw-r-- 1 skoumal users 119673 Nov 7 17:40 02-13P004N_3-MZ
-rw-rw-r-- 1 skoumal users 120363 Nov 7 17:40 02-13T029N_6-MZ
-rw-rw-r-- 1 skoumal users 117594 Nov 7 17:40 02-14A016N_2-MZ
-rw-rw-r-- 1 skoumal users 156603 Nov 7 17:40 02-14E003N_0-MZ
-rw-rw-r-- 1 skoumal users 153488 Nov 7 17:40 02-14T003N_5-MZ
-rw-rw-r-- 1 skoumal users 147516 Nov 7 17:40 02-14T013N_2-MZ
-rw-rw-r-- 1 skoumal users 178036 Nov 7 17:40 02-14T020N_3-MZ
-rw-rw-r-- 1 skoumal users 172908 Nov 7 17:40 02-15O002N_0-MZ
-rw-rw-r-- 1 skoumal users 122828 Nov 7 17:40 02-15O004N_1-MZ
-rw-rw-r-- 1 skoumal users 127983 Nov 7 17:40 02-15O009N_1-MZ
-rw-rw-r-- 1 skoumal users 129864 Nov 7 17:40 02-15P001N_3-MZ
-rw-rw-r-- 1 skoumal users 129648 Nov 7 17:40 02-15X041N_1-MZ
-rw-rw-r-- 1 skoumal users 150095 Nov 7 17:40 02-16A001N_0-MZ
==== Anotátoři a přidělené soubory (davka-2) ====
=== Václav Horký - VH ===
* (7862) -- hotovo:
-rw-rw-r-- 1 skoumal users 142827 May 27 17:40 06-12A011N_0-VH
-rw-rw-r-- 1 skoumal users 166457 May 27 17:40 06-12P004N_3-VH
-rw-rw-r-- 1 skoumal users 131514 May 27 17:40 06-13A003N_1-VH
-rw-rw-r-- 1 skoumal users 128413 May 27 17:40 06-13A014N_2-VH
-rw-rw-r-- 1 skoumal users 144623 May 27 17:40 06-13A028N_2-VH
-rw-rw-r-- 1 skoumal users 150403 May 27 17:40 06-13A074N_1-VH
-rw-rw-r-- 1 skoumal users 139190 May 27 17:40 06-13B005N_0-VH
-rw-rw-r-- 1 skoumal users 189188 May 27 17:40 06-13B028N_1-VH
-rw-rw-r-- 1 skoumal users 155333 May 27 17:40 06-13O007N_2-VH
-rw-rw-r-- 1 skoumal users 189392 May 27 17:40 06-14A006N_0-VH
-rw-rw-r-- 1 skoumal users 143619 May 27 17:40 06-14A008N_3-VH
-rw-rw-r-- 1 skoumal users 184286 May 27 17:40 06-14E001N_0-VH
-rw-rw-r-- 1 skoumal users 149792 May 27 17:40 06-14P007N_3-VH
-rw-rw-r-- 1 skoumal users 147049 May 27 17:40 06-14T007N_0-VH
-rw-rw-r-- 1 skoumal users 164936 May 27 17:40 06-14T020N_1-VH
-rw-rw-r-- 1 skoumal users 170219 May 27 17:40 06-14X016N_1-VH
-rw-rw-r-- 1 skoumal users 128950 May 27 17:40 06-15E017N_5-VH
-rw-rw-r-- 1 skoumal users 166806 May 27 17:40 06-15O010N_2-VH
-rw-rw-r-- 1 skoumal users 137835 May 27 17:40 06-16A001N_3-VH
-rw-rw-r-- 1 skoumal users 131549 May 27 17:40 06-16P002N_3-VH
-rw-rw-r-- 1 skoumal users 106379 May 27 17:40 06-16P007N_5-VH
-rw-rw-r-- 1 skoumal users 141727 May 27 17:40 06-16X003N_1-VH
-rw-rw-r-- 1 skoumal users 169034 May 27 17:40 06-16X030N_1-VH
* (7028) -- hotovo:
-rw-r--r-- 1 skoumal users 91649 Jun 24 18:06 07-12A037N_4-VH
-rw-r--r-- 1 skoumal users 161669 Jun 24 18:06 07-12O002N_0-VH
-rw-r--r-- 1 skoumal users 144382 Jun 24 18:06 07-13A003N_2-VH
-rw-r--r-- 1 skoumal users 148109 Jun 24 18:06 07-13A014N_0-VH
-rw-r--r-- 1 skoumal users 106736 Jun 24 18:06 07-13A028N_4-VH
-rw-r--r-- 1 skoumal users 142377 Jun 24 18:06 07-13A036N_5-VH
-rw-r--r-- 1 skoumal users 149087 Jun 24 18:06 07-13A050N_0-VH
-rw-r--r-- 1 skoumal users 166318 Jun 24 18:06 07-13E004N_6-VH
-rw-r--r-- 1 skoumal users 154101 Jun 24 18:06 07-13O004N_0-VH
-rw-r--r-- 1 skoumal users 144107 Jun 24 18:06 07-14T003N_4-VH
-rw-r--r-- 1 skoumal users 121502 Jun 24 18:06 07-14T007N_2-VH
-rw-r--r-- 1 skoumal users 149152 Jun 24 18:06 07-14T013N_3-VH
-rw-r--r-- 1 skoumal users 152439 Jun 24 18:06 07-14X016N_4-VH
-rw-r--r-- 1 skoumal users 131964 Jun 24 18:06 07-15C004N_0-VH
-rw-r--r-- 1 skoumal users 134751 Jun 24 18:06 07-15O009N_0-VH
-rw-r--r-- 1 skoumal users 129301 Jun 24 18:06 07-15P002N_0-VH
-rw-r--r-- 1 skoumal users 180038 Jun 24 18:06 07-16A005N_0-VH
-rw-r--r-- 1 skoumal users 133426 Jun 24 18:06 07-16A009N_3-VH
-rw-r--r-- 1 skoumal users 125528 Jun 24 18:06 07-16P007N_2-VH
-rw-r--r-- 1 skoumal users 128922 Jun 24 18:06 07-16X003N_3-VH
-rw-r--r-- 1 skoumal users 140285 Jun 24 18:06 07-16X031N_0-VH
-rw-r--r-- 1 skoumal users 129726 Jun 24 18:06 07-16X033N_3-VH
=== Anna Nováková - AN ===
* (7634) -- hotovo:
-rw-r--r-- 1 skoumal users 136812 Jun 24 18:06 08-12A009N_0-AN
-rw-r--r-- 1 skoumal users 162292 Jun 24 18:06 08-12A031N_0-AN
-rw-r--r-- 1 skoumal users 174507 Jun 24 18:06 08-13A018N_0-AN
-rw-r--r-- 1 skoumal users 150144 Jun 24 18:06 08-13A036N_0-AN
-rw-r--r-- 1 skoumal users 147524 Jun 24 18:06 08-13A090N_4-AN
-rw-r--r-- 1 skoumal users 159163 Jun 24 18:06 08-13B019N_1-AN
-rw-r--r-- 1 skoumal users 121300 Jun 24 18:06 08-13B028N_0-AN
-rw-r--r-- 1 skoumal users 117897 Jun 24 18:06 08-13O009N_0-AN
-rw-r--r-- 1 skoumal users 122809 Jun 24 18:06 08-13P004N_0-AN
-rw-r--r-- 1 skoumal users 159431 Jun 24 18:06 08-14C006N_0-AN
-rw-r--r-- 1 skoumal users 153240 Jun 24 18:06 08-14T003N_1-AN
-rw-r--r-- 1 skoumal users 132491 Jun 24 18:06 08-14T014N_4-AN
-rw-r--r-- 1 skoumal users 157464 Jun 24 18:06 08-14X016N_5-AN
-rw-r--r-- 1 skoumal users 164278 Jun 24 18:06 08-15E010N_5-AN
-rw-r--r-- 1 skoumal users 132281 Jun 24 18:06 08-15O010N_1-AN
-rw-r--r-- 1 skoumal users 162485 Jun 24 18:06 08-15X020N_2-AN
-rw-r--r-- 1 skoumal users 177612 Jun 24 18:06 08-15X041N_3-AN
-rw-r--r-- 1 skoumal users 181697 Jun 24 18:06 08-16A005N_3-AN
-rw-r--r-- 1 skoumal users 139212 Jun 24 18:06 08-16E005N_4-AN
-rw-r--r-- 1 skoumal users 121915 Jun 24 18:06 08-16E007N_4-AN
-rw-r--r-- 1 skoumal users 137518 Jun 24 18:06 08-16X003N_4-AN
-rw-r--r-- 1 skoumal users 176342 Jun 24 18:06 08-16X026N_1-AN
-rw-r--r-- 1 skoumal users 141328 Jun 24 18:06 08-16X031N_2-AN
=== Michal Havrda - MH ===
* (7117) -- hotovo:
-rw-r--r-- 1 skoumal users 160033 Jun 24 18:06 10-13A005N_4-MH
-rw-r--r-- 1 skoumal users 137114 Jun 24 18:06 10-13A011N_3-MH
-rw-r--r-- 1 skoumal users 112974 Jun 24 18:06 10-13A018N_2-MH
-rw-r--r-- 1 skoumal users 157987 Jun 24 18:06 10-13A074N_5-MH
-rw-r--r-- 1 skoumal users 129620 Jun 24 18:06 10-13B016N_1-MH
-rw-r--r-- 1 skoumal users 152048 Jun 24 18:06 10-13O003N_0-MH
-rw-r--r-- 1 skoumal users 126483 Jun 24 18:06 10-13P009N_0-MH
-rw-r--r-- 1 skoumal users 157568 Jun 24 18:06 10-14A011N_1-MH
-rw-r--r-- 1 skoumal users 154505 Jun 24 18:06 10-14C009N_2-MH
-rw-r--r-- 1 skoumal users 181761 Jun 24 18:06 10-14O007N_0-MH
-rw-r--r-- 1 skoumal users 135186 Jun 24 18:06 10-14P006N_1-MH
-rw-r--r-- 1 skoumal users 143875 Jun 24 18:06 10-15O011N_0-MH
-rw-r--r-- 1 skoumal users 122224 Jun 24 18:06 10-15O012N_0-MH
-rw-r--r-- 1 skoumal users 166988 Jun 24 18:06 10-15P004N_0-MH
-rw-r--r-- 1 skoumal users 122777 Jun 24 18:06 10-15T002N_2-MH
-rw-r--r-- 1 skoumal users 202198 Jun 24 18:06 10-15T003N_1-MH
-rw-r--r-- 1 skoumal users 190825 Jun 24 18:06 10-15T011N_4-MH
-rw-r--r-- 1 skoumal users 181748 Jun 24 18:06 10-15X020N_0-MH
-rw-r--r-- 1 skoumal users 161548 Jun 24 18:06 10-16A002N_0-MH
-rw-r--r-- 1 skoumal users 114890 Jun 24 18:06 10-16E007N_5-MH
-rw-r--r-- 1 skoumal users 128041 Jun 24 18:06 10-16P004N_3-MH
-rw-r--r-- 1 skoumal users 131846 Jun 24 18:06 10-16X003N_0-MH
=== Pavel Kopřiva ===
* (7328) -- hotovo:
-rw-r--r-- 1 skoumal users 123564 Jun 24 18:06 09-12A004N_1-PK
-rw-r--r-- 1 skoumal users 140637 Jun 24 18:06 09-12A034N_3-PK
-rw-r--r-- 1 skoumal users 137184 Jun 24 18:06 09-12H004N_1-PK
-rw-r--r-- 1 skoumal users 118561 Jun 24 18:06 09-13A003N_0-PK
-rw-r--r-- 1 skoumal users 151930 Jun 24 18:06 09-13A074N_4-PK
-rw-r--r-- 1 skoumal users 137469 Jun 24 18:06 09-13A090N_2-PK
-rw-r--r-- 1 skoumal users 115312 Jun 24 18:06 09-13B011N_0-PK
-rw-r--r-- 1 skoumal users 161409 Jun 24 18:06 09-13B027N_0-PK
-rw-r--r-- 1 skoumal users 156513 Jun 24 18:06 09-13O007N_3-PK
-rw-r--r-- 1 skoumal users 143758 Jun 24 18:06 09-13P008N_1-PK
-rw-r--r-- 1 skoumal users 147253 Jun 24 18:06 09-13T029N_3-PK
-rw-r--r-- 1 skoumal users 218778 Jun 24 18:06 09-13X003N_0-PK
-rw-r--r-- 1 skoumal users 121754 Jun 24 18:06 09-14A016N_0-PK
-rw-r--r-- 1 skoumal users 132630 Jun 24 18:06 09-14C006N_3-PK
-rw-r--r-- 1 skoumal users 98129 Jun 24 18:06 09-14T024N_4-PK
-rw-r--r-- 1 skoumal users 171839 Jun 24 18:06 09-14X016N_2-PK
-rw-r--r-- 1 skoumal users 143302 Jun 24 18:06 09-15O004N_2-PK
-rw-r--r-- 1 skoumal users 170311 Jun 24 18:06 09-15X041N_0-PK
-rw-r--r-- 1 skoumal users 148747 Jun 24 18:06 09-15X044N_1-PK
-rw-r--r-- 1 skoumal users 169442 Jun 24 18:06 09-16A002N_1-PK
-rw-r--r-- 1 skoumal users 104915 Jun 24 18:06 09-16E007N_1-PK
-rw-r--r-- 1 skoumal users 145808 Jun 24 18:06 09-16X030N_0-PK
==== Kontrola a převzetí textů ====
* Stejným způsobem jako při [[wiki:user:skoumal:anotace|Anotaci]]
* Pracujeme na **jakobsonovi**
* Nejdříve texty zkontrolujeme:cd /net/grimm/store/corp/ortofon-etalon/csts-import
for ff in 04-15X030N_3-PK 04-15X043N_2-PK 04-16A009N_2-PK; \
do echo $ff; \
/usr/local/annotate/bin/csts-export.pl --verbose $ff > /dev/null; done
* Je-li vše v pořádku, soubory uložíme.
==== Převod zpět do vertikály, opravy ====
* Vyrobíme adresář ''vert-export'' a převedeme soubory do něj:
cd csts-export
for ff in *; do echo $ff; oral-csts-vert.pl < $ff > ../vert-export/$ff; done
zde jsou opraveny i entity, takže první sloupec by měl odpovídat originálu:
cs ../chunks
for ff in *; do echo $ff; sdiff -s <(cut -f1 $ff) <(cut -f1 ../vert-export/${ff%.vrt}-??); done
* Opravíme ''invalid'' a ''X@''
==== Problematické Horkého opravy -- dotazy na MK a DL ====
* Tokenizace:
* dvěstě ---> dvě stě
* v o ---> vo
* od tamaď ---> odtamaď
* třinácet ---> třináct set
* takovýty ---> takový ty
* tyjo ---> ty jo
* napohodu ---> na pohodu
* ježíš maria ---> ježíšmarja
* v spára ---> V - spára (neměl by být přepis vé?)
* ježíši maria ---> ježíšimarja
* devatenácet ---> devatenáct set
* osmnáctset ---> osmnáct set
===== Příprava dat pro automatickou anotaci (hybrid vs. MorphoDiTa) =====
* Data jsou na grimmovi v adresáři ''/store/corp/Ortofon''.
* Pracuje se s vertikálou ze souborů v ''/store/corp/Ortofon/ortofon-etalon/Verze/2/1''.
* Příprava společných dat z ''ortofon-etalon/davka-?'':
cd ortofon-hybrid/davka-1
mkdir vert
cd ../ortofon-etalon/davka-1/Verze/2/1
for ff in *; do echo $ff; cut -f1 $ff | perl -pe 's/^<.*>$//' \
| cat -s > ../../../../../ortofon-hybrid/davka-1/vert/$ff; done
cd ../../../../../ortofon-hybrid/davka-1
make-corp.sh -s vert -t csts -v -p45
make-corp.sh -A1 -B1 -Eucs2 -M -p45 -s csts -t csts-morf -v
==== ortofon-hybrid ====
* Projede se celým naším hybridem a na závěr se upraví podle potřeb ortofonu.
* Příprava dat:
cd .../ortofon-hybrid/davka-?
rsync -avz ../../ortofon-manual/davka-?/csts-morf .
* Honzův skript ''processing_hybrid.pl'' (na vertikálu):
make-corp.sh -s csts-morf -t vert-morf -p45 -v
cd /usr/local/corp/Perl/Ortofon
./processing_hybrid.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-morf
cd -
cp -pr vert-morf vert-morf.ori
cd vert-morf
for ff in *.out; do mv $ff ${ff%.out}; done
cd -
mv csts-morf csts-morf.ori
make-corp.sh -s vert-morf -t csts-morf -p45 -v
* Pravidla až do konce:
make-whole-corp-csts.sh -C1 -Eucs2 -f -M -p45 -trules -v
* A ještě nějaké menší opravy (//já// --> //my// apod.) Tomášovým skriptem ''EtalonizaceVertikaly.pl'':
make-corp.sh -s csts-rules-frazrl-rulh1-tag-vid-corr -t vert-rules-frazrl-rulh1-tag-vid-corr -p45 -v
parallel-filter.sh -C "/usr/local/corp/Perl/EtalonizaceVertikaly.pl" \
-s vert-rules-frazrl-rulh1-tag-vid-corr -t vert-hybrid -p45 -v
* Honzův skript ''postprocessing16.pl'' (na vertikálu):
cd /usr/local/corp/Perl/Ortofon
./postprocessing16.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid
cd /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid
mkdir ../vert-hybrid-out
for ff in *.post; do mv $ff ../vert-hybrid-out/${ff%.post}; done
cd ../vert-hybrid-out
for ff in *; do echo $ff; sed '1{/^$/d}' $ff > ../../../ortofon-automat/davka-2/vert-hybrid/$ff; done
==== ortofon-morphodita ====
* Pro MorphoDitu se připraví morfologie, která ale musí být v souladu s Etalonem a Davidovými skripty.
* Příprava dat:
cd .../ortofon-morphodita
mkdir davka-?
cd davka-?
rsync -avz ../../ortofon-manual/davka-?/csts-morf .
* Ovidování:
make-corp.sh -s csts-morf -t csts-morf-vid -v -p45
* Opravy vidu, roznásobení proměnných, zjednodušení tagů a odstranění duplicit:
parallel-filter.sh \
-C "corr-asp.pl | JH-wide-csts.sh | simplify-tags-csts-utf.pl | remove-dupl-csts-mark.pl X" \
-p45 -s csts-morf-vid -t csts-morf-vid-corr -v
* Tomášův skript:
make-corp.sh -s csts-morf-vid-corr -t vert-morf-vid-corr -p45 -v
parallel-filter.sh -C EtalonizaceVertikaly.pl -s vert-morf-vid-corr -t vert-morf-vid-corr-etln -p45 -v
* Honzův skript ''processing_mdita.pl'' (na vertikálu). Vzniknou soubory ''.out'':
cd /usr/local/corp/Perl/Ortofon
./processing_mdita.pl /store/corp/Ortofon/ortofon-morphodita/davka-?/vert-morf-vid-corr-etln
* Převedeme Honzův výstup na vstup pro MDiTu:
cd /store/corp/Ortofon/ortofon-morphodita/davka-?
mkdir vert-morphodita-in
cd vert-morf-vid-corr-etln
for ff in *.out; do echo $ff; sed '1{/^$/d}' $ff > ../vert-morphodita-in/${ff%.out}; done
rm *.out
* Spustíme MorphoDiTu a výsledek uložíme do ''/store/corp/Ortofon/ortofon-morphodita/vert-morphodita-out''.
* Honzův skript ''postprocessing16.pl'' (na vertikálu):
parallel-filter.sh -C "cut -f1-3 | perl -pe 's/(\t.*)\t/\$1 /'" -s vert-morphodita-out \
-t vert-morphodita-result -p45 -v
cd /usr/local/corp/Perl/Ortofon
./postprocessing16.pl /store/corp/Ortofon/ortofon-morphodita/davka-2/vert-morphodita-result
cd -
* Umístíme do adresáře, kde se sleje MorphoDiTa s hybridem pro ruční anotaci:
cd ../../ortofon-automat
mkdir -p davka-2/vert-morphodita
cd ../ortofon-morphodita/davka-2/vert-morphodita-result
for ff in *.post; do mv $ff ../../../ortofon-automat/davka-2/vert-morphodita/${ff%.post}; done
==== Slití výsledků a příprava importu (ortofon-automat) ====
* Vše je v adresářích ''ortofon-automat/davka-?''.
* V adresáři ''vert-hybrid'' jsou výsledky hybridu (viz výše).
* V adresáři ''vert-morphodita'' jsou výsledky MorphoDiTy (viz výše).
* Do adresáře ''vert-paste'' slijeme MorphoDiTu a hybrid:
cd .../ortofon-automat/davka-?
mkdir vert-paste
cd vert-morphodita
for ff in *; do paste $ff <(cut -f2- ../vert-hybrid/$ff) | perl -pe 's/^[\t\ ]+$//' > ../vert-paste/$ff; done
* Převedeme do ''csts'':
make-corp.sh -s vert-paste -t csts-paste -p45 -v
* Odstraníme duplicity:parallel-filter.sh -C remove-dupl-csts.pl -p45 -s csts-paste -t csts-import -v
==== Anotátoři a přidělené soubory (davka-1) ====
=== Pavel Kopřiva ===
* předplaceno 9.800, tj. 14.000 slovíček; 1. dávka 14.197 slov, **hotovo**
-rw-rw-r-- 1 skoumal users 34407 May 23 14:41 01-12A005N_1-PK
-rw-rw-r-- 1 skoumal users 46102 May 23 14:41 01-13A014N_1-PK
-rw-rw-r-- 1 skoumal users 40307 May 23 14:41 01-13A028N_1-PK
-rw-rw-r-- 1 skoumal users 41092 May 23 14:41 01-13B031N_1-PK
-rw-rw-r-- 1 skoumal users 41144 May 23 14:41 01-13H005N_1-PK
-rw-rw-r-- 1 skoumal users 44019 May 23 14:41 01-13O009N_3-PK
-rw-rw-r-- 1 skoumal users 46809 May 23 14:41 01-13P004N_1-PK
-rw-rw-r-- 1 skoumal users 43321 May 23 14:41 01-13P009N_2-PK
-rw-rw-r-- 1 skoumal users 39320 May 23 14:41 01-14T003N_6-PK
-rw-rw-r-- 1 skoumal users 44702 May 23 14:41 01-14T007N_1-PK
-rw-rw-r-- 1 skoumal users 41621 May 23 14:41 01-14T010N_0-PK
-rw-rw-r-- 1 skoumal users 44150 May 23 14:41 01-14X016N_6-PK
-rw-rw-r-- 1 skoumal users 41916 May 23 14:41 01-14X019N_0-PK
-rw-rw-r-- 1 skoumal users 44807 May 23 14:41 01-15O001N_0-PK
-rw-rw-r-- 1 skoumal users 41851 May 23 14:41 01-15O004N_0-PK
-rw-rw-r-- 1 skoumal users 44557 May 23 14:41 01-15O007N_1-PK
-rw-rw-r-- 1 skoumal users 45038 May 23 14:41 01-15P001N_1-PK
-rw-rw-r-- 1 skoumal users 50573 May 23 14:41 01-15T005N_0-PK
-rw-rw-r-- 1 skoumal users 42191 May 23 14:41 01-15X045N_2-PK
-rw-rw-r-- 1 skoumal users 42876 May 23 14:41 01-16A005N_2-PK
-rw-rw-r-- 1 skoumal users 41182 May 23 14:41 01-16P002N_2-PK
-rw-rw-r-- 1 skoumal users 37522 May 23 14:41 01-16X033N_5-PK
-rw-rw-r-- 1 skoumal users 37307 May 23 14:41 02-12A011N_1-PK
-rw-rw-r-- 1 skoumal users 43296 May 23 14:41 02-13A011N_5-PK
-rw-rw-r-- 1 skoumal users 43113 May 23 14:41 02-13A036N_3-PK
-rw-rw-r-- 1 skoumal users 42816 May 23 14:41 02-13B031N_0-PK
-rw-rw-r-- 1 skoumal users 39465 May 23 14:41 02-13O009N_1-PK
-rw-rw-r-- 1 skoumal users 45071 May 23 14:41 02-13O010N_2-PK
-rw-rw-r-- 1 skoumal users 45410 May 23 14:41 02-13O013N_1-PK
-rw-rw-r-- 1 skoumal users 41771 May 23 14:41 02-13P004N_3-PK
-rw-rw-r-- 1 skoumal users 36768 May 23 14:41 02-13T029N_6-PK
-rw-rw-r-- 1 skoumal users 39927 May 23 14:41 02-14A016N_2-PK
-rw-rw-r-- 1 skoumal users 42561 May 23 14:41 02-14E003N_0-PK
-rw-rw-r-- 1 skoumal users 44630 May 23 14:41 02-14T003N_5-PK
-rw-rw-r-- 1 skoumal users 41947 May 23 14:41 02-14T013N_2-PK
-rw-rw-r-- 1 skoumal users 51442 May 23 14:41 02-14T020N_3-PK
-rw-rw-r-- 1 skoumal users 49831 May 23 14:41 02-15O002N_0-PK
-rw-rw-r-- 1 skoumal users 40571 May 23 14:41 02-15O004N_1-PK
-rw-rw-r-- 1 skoumal users 43099 May 23 14:41 02-15O009N_1-PK
-rw-rw-r-- 1 skoumal users 41565 May 23 14:41 02-15P001N_3-PK
-rw-rw-r-- 1 skoumal users 41993 May 23 14:41 02-15X041N_1-PK
-rw-rw-r-- 1 skoumal users 41233 May 23 14:41 02-16A001N_0-PK
-rw-rw-r-- 1 skoumal users 42439 May 23 14:41 02-16A005N_5-PK
-rw-rw-r-- 1 skoumal users 46223 May 23 14:41 02-16P002N_0-PK
-rw-rw-r-- 1 skoumal users 38555 May 23 14:41 02-16X003N_5-PK
-rw-rw-r-- 1 skoumal users 42220 May 23 14:41 03-12A035N_3-PK
-rw-rw-r-- 1 skoumal users 44055 May 23 14:41 03-13A014N_4-PK
-rw-rw-r-- 1 skoumal users 40827 May 23 14:41 03-13O009N_2-PK
-rw-rw-r-- 1 skoumal users 40803 May 23 14:41 03-13P010N_1-PK
-rw-rw-r-- 1 skoumal users 46769 May 23 14:41 03-14A011N_3-PK
-rw-rw-r-- 1 skoumal users 41318 May 23 14:41 03-14A016N_4-PK
-rw-rw-r-- 1 skoumal users 44503 May 23 14:41 03-14P007N_2-PK
-rw-rw-r-- 1 skoumal users 47542 May 23 14:41 03-14T010N_2-PK
-rw-rw-r-- 1 skoumal users 47136 May 23 14:41 03-14T013N_0-PK
-rw-rw-r-- 1 skoumal users 47617 May 23 14:41 03-14T020N_0-PK
-rw-rw-r-- 1 skoumal users 43448 May 23 14:41 03-14X019N_4-PK
-rw-rw-r-- 1 skoumal users 45235 May 23 14:41 03-14X021N_2-PK
-rw-rw-r-- 1 skoumal users 43916 May 23 14:41 03-15E003N_0-PK
-rw-rw-r-- 1 skoumal users 43217 May 23 14:41 03-15E015N_1-PK
-rw-rw-r-- 1 skoumal users 42814 May 23 14:41 03-15O002N_1-PK
-rw-rw-r-- 1 skoumal users 47330 May 23 14:41 03-15O007N_0-PK
-rw-rw-r-- 1 skoumal users 47496 May 23 14:41 03-15X020N_1-PK
-rw-rw-r-- 1 skoumal users 44998 May 23 14:41 03-15X041N_2-PK
-rw-rw-r-- 1 skoumal users 47227 May 23 14:41 03-16A005N_4-PK
-rw-rw-r-- 1 skoumal users 43259 May 23 14:41 03-16P004N_0-PK
-rw-rw-r-- 1 skoumal users 47252 May 23 14:41 03-16X001N_1-PK
-rw-rw-r-- 1 skoumal users 40593 May 23 14:41 03-16X031N_4-PK
-rw-rw-r-- 1 skoumal users 40675 May 23 14:41 04-12A025N_0-PK
-rw-rw-r-- 1 skoumal users 46531 May 23 14:41 04-12P004N_4-PK
-rw-rw-r-- 1 skoumal users 45177 May 23 14:41 04-13B009N_0-PK
-rw-rw-r-- 1 skoumal users 43618 May 23 14:41 04-13B019N_0-PK
-rw-rw-r-- 1 skoumal users 44753 May 23 14:41 04-13B025N_0-PK
-rw-rw-r-- 1 skoumal users 43003 May 23 14:41 04-13O007N_1-PK
-rw-rw-r-- 1 skoumal users 44914 May 23 14:41 04-13O010N_0-PK
-rw-rw-r-- 1 skoumal users 43164 May 23 14:41 04-13O013N_0-PK
-rw-rw-r-- 1 skoumal users 44184 May 23 14:41 04-13P004N_2-PK
-rw-rw-r-- 1 skoumal users 42275 May 23 14:41 04-14P007N_0-PK
-rw-rw-r-- 1 skoumal users 49243 May 23 14:41 04-14T003N_0-PK
-rw-rw-r-- 1 skoumal users 40458 May 23 14:41 04-14T010N_1-PK
-rw-rw-r-- 1 skoumal users 42344 May 23 14:41 04-14T013N_1-PK
-rw-rw-r-- 1 skoumal users 40829 May 23 14:41 04-14X016N_3-PK
-rw-rw-r-- 1 skoumal users 40664 May 23 14:41 04-15O010N_0-PK
-rw-rw-r-- 1 skoumal users 43180 May 23 14:41 04-15P001N_2-PK
-rw-rw-r-- 1 skoumal users 40232 May 23 14:41 04-15P006N_0-PK
-rw-rw-r-- 1 skoumal users 48556 May 23 14:41 05-14T010N_3-PK
-rw-rw-r-- 1 skoumal users 44430 May 23 14:41 05-14T019N_0-PK
-rw-rw-r-- 1 skoumal users 39720 May 23 14:41 05-14X012N_2-PK
-rw-rw-r-- 1 skoumal users 49071 May 23 14:41 05-14X019N_2-PK
-rw-rw-r-- 1 skoumal users 49252 May 23 14:41 05-14X019N_3-PK
-rw-rw-r-- 1 skoumal users 44280 May 23 14:41 05-15O001N_1-PK
-rw-rw-r-- 1 skoumal users 48997 May 23 14:41 05-15P001N_0-PK
-rw-rw-r-- 1 skoumal users 49363 May 23 14:41 05-15X009N_1-PK
-rw-rw-r-- 1 skoumal users 48507 May 23 14:41 05-15X020N_3-PK
-rw-rw-r-- 1 skoumal users 38867 May 23 14:41 05-15X043N_5-PK
-rw-rw-r-- 1 skoumal users 43559 May 23 14:41 05-16A005N_1-PK
-rw-rw-r-- 1 skoumal users 44935 May 23 14:41 05-16E007N_0-PK
-rw-rw-r-- 1 skoumal users 41154 May 23 14:41 05-16P002N_1-PK
-rw-rw-r-- 1 skoumal users 44495 May 23 14:41 05-16P007N_1-PK
-rw-rw-r-- 1 skoumal users 46831 May 23 14:41 05-16X001N_2-PK
=== Michal Havrda ===
* může celou dobu, 1. dávka 1.938 slov, **hotovo**, nevykázáno
-rw-rw-r-- 1 skoumal users 42289 May 23 14:41 04-15X030N_3-MH
-rw-rw-r-- 1 skoumal users 46559 May 23 14:41 04-15X043N_2-MH
-rw-rw-r-- 1 skoumal users 45223 May 23 14:41 04-16A009N_2-MH
-rw-rw-r-- 1 skoumal users 43922 May 23 14:41 04-16E007N_2-MH
-rw-rw-r-- 1 skoumal users 48672 May 23 14:41 04-16P004N_1-MH
-rw-rw-r-- 1 skoumal users 42605 May 23 14:41 04-16X003N_2-MH
-rw-rw-r-- 1 skoumal users 41810 May 23 14:41 05-12P004N_2-MH
-rw-rw-r-- 1 skoumal users 42430 May 23 14:41 05-13A011N_0-MH
-rw-rw-r-- 1 skoumal users 46764 May 23 14:41 05-13A014N_3-MH
-rw-rw-r-- 1 skoumal users 40185 May 23 14:41 05-13A023N_3-MH
-rw-rw-r-- 1 skoumal users 41116 May 23 14:41 05-13B005N_1-MH
-rw-rw-r-- 1 skoumal users 48129 May 23 14:41 05-13D015N_0-MH
-rw-rw-r-- 1 skoumal users 44079 May 23 14:41 05-13O007N_0-MH
-rw-rw-r-- 1 skoumal users 45164 May 23 14:41 05-14A011N_2-MH
=== Anna Nováková ===
* může až v červenci
=== Šárka Kadavá ===
* kdyby bylo nejhůř
=== Václav Horký ===
* jako pomvěd
==== Anotátoři a přidělené soubory (davka-2) ====
=== Pavel Kopřiva ===
* 2. dávka 15.313 slov
-rw-rw-r-- 1 skoumal users 42720 May 30 17:32 06-12A011N_0-PK
-rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-12P004N_3-PK
-rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-13A003N_1-PK
-rw-rw-r-- 1 skoumal users 44326 May 30 17:32 06-13A014N_2-PK
-rw-rw-r-- 1 skoumal users 45282 May 30 17:32 06-13A028N_2-PK
-rw-rw-r-- 1 skoumal users 46555 May 30 17:32 06-13A074N_1-PK
-rw-rw-r-- 1 skoumal users 43331 May 30 17:32 06-13B005N_0-PK
-rw-rw-r-- 1 skoumal users 48089 May 30 17:32 06-13B028N_1-PK
-rw-rw-r-- 1 skoumal users 40901 May 30 17:32 06-13O007N_2-PK
-rw-rw-r-- 1 skoumal users 48638 May 30 17:32 06-14A006N_0-PK
-rw-rw-r-- 1 skoumal users 40347 May 30 17:32 06-14A008N_3-PK
-rw-rw-r-- 1 skoumal users 47879 May 30 17:32 06-14E001N_0-PK
-rw-rw-r-- 1 skoumal users 44009 May 30 17:32 06-14P007N_3-PK
-rw-rw-r-- 1 skoumal users 44757 May 30 17:32 06-14T007N_0-PK
-rw-rw-r-- 1 skoumal users 45276 May 30 17:32 06-14T020N_1-PK
-rw-rw-r-- 1 skoumal users 46545 May 30 17:32 06-14X016N_1-PK
-rw-rw-r-- 1 skoumal users 42302 May 30 17:32 06-15E017N_5-PK
-rw-rw-r-- 1 skoumal users 51075 May 30 17:32 06-15O010N_2-PK
-rw-rw-r-- 1 skoumal users 41040 May 30 17:32 06-16A001N_3-PK
-rw-rw-r-- 1 skoumal users 43143 May 30 17:32 06-16P002N_3-PK
-rw-rw-r-- 1 skoumal users 35558 May 30 17:32 06-16P007N_5-PK
-rw-rw-r-- 1 skoumal users 46698 May 30 17:32 06-16X003N_1-PK
-rw-rw-r-- 1 skoumal users 45687 May 30 17:32 06-16X030N_1-PK
-rw-rw-r-- 1 skoumal users 39012 May 30 17:32 07-12A037N_4-PK
-rw-rw-r-- 1 skoumal users 44139 May 30 17:32 07-12O002N_0-PK
-rw-rw-r-- 1 skoumal users 46794 May 30 17:32 07-13A003N_2-PK
-rw-rw-r-- 1 skoumal users 46770 May 30 17:32 07-13A014N_0-PK
-rw-rw-r-- 1 skoumal users 39791 May 30 17:32 07-13A028N_4-PK
-rw-rw-r-- 1 skoumal users 43325 May 30 17:32 07-13A036N_5-PK
-rw-rw-r-- 1 skoumal users 48700 May 30 17:32 07-13A050N_0-PK
-rw-rw-r-- 1 skoumal users 42543 May 30 17:32 07-13E004N_6-PK
-rw-rw-r-- 1 skoumal users 43506 May 30 17:32 07-13O004N_0-PK
-rw-rw-r-- 1 skoumal users 44126 May 30 17:32 07-14T003N_4-PK
-rw-rw-r-- 1 skoumal users 41497 May 30 17:32 07-14T007N_2-PK
-rw-rw-r-- 1 skoumal users 45303 May 30 17:32 07-14T013N_3-PK
-rw-rw-r-- 1 skoumal users 46002 May 30 17:32 07-14X016N_4-PK
-rw-rw-r-- 1 skoumal users 42059 May 30 17:32 07-15C004N_0-PK
-rw-rw-r-- 1 skoumal users 46510 May 30 17:32 07-15O009N_0-PK
-rw-rw-r-- 1 skoumal users 41922 May 30 17:32 07-15P002N_0-PK
-rw-rw-r-- 1 skoumal users 47748 May 30 17:32 07-16A005N_0-PK
-rw-rw-r-- 1 skoumal users 40438 May 30 17:32 07-16A009N_3-PK
-rw-rw-r-- 1 skoumal users 40615 May 30 17:32 07-16P007N_2-PK
-rw-rw-r-- 1 skoumal users 45292 May 30 17:32 07-16X003N_3-PK
-rw-rw-r-- 1 skoumal users 41981 May 30 17:32 07-16X031N_0-PK
-rw-rw-r-- 1 skoumal users 44339 May 30 17:32 07-16X033N_3-PK
-rw-rw-r-- 1 skoumal users 40608 May 30 17:32 08-12A009N_0-PK
-rw-rw-r-- 1 skoumal users 45404 May 30 17:32 08-12A031N_0-PK
-rw-rw-r-- 1 skoumal users 50613 May 30 17:32 08-13A018N_0-PK
-rw-rw-r-- 1 skoumal users 43737 May 30 17:32 08-13A036N_0-PK
-rw-rw-r-- 1 skoumal users 41190 May 30 17:32 08-13A090N_4-PK
-rw-rw-r-- 1 skoumal users 43500 May 30 17:32 08-13B019N_1-PK
-rw-rw-r-- 1 skoumal users 42045 May 30 17:32 08-13B028N_0-PK
-rw-rw-r-- 1 skoumal users 39830 May 30 17:32 08-13O009N_0-PK
-rw-rw-r-- 1 skoumal users 45677 May 30 17:32 08-13P004N_0-PK
-rw-rw-r-- 1 skoumal users 43908 May 30 17:32 08-14C006N_0-PK
-rw-rw-r-- 1 skoumal users 46106 May 30 17:32 08-14T003N_1-PK
-rw-rw-r-- 1 skoumal users 42032 May 30 17:32 08-14T014N_4-PK
-rw-rw-r-- 1 skoumal users 48395 May 30 17:32 08-14X016N_5-PK
-rw-rw-r-- 1 skoumal users 41217 May 30 17:32 08-15E010N_5-PK
-rw-rw-r-- 1 skoumal users 42667 May 30 17:32 08-15O010N_1-PK
-rw-rw-r-- 1 skoumal users 46423 May 30 17:32 08-15X020N_2-PK
-rw-rw-r-- 1 skoumal users 49698 May 30 17:32 08-15X041N_3-PK
-rw-rw-r-- 1 skoumal users 50216 May 30 17:32 08-16A005N_3-PK
-rw-rw-r-- 1 skoumal users 47850 May 30 17:32 08-16E005N_4-PK
-rw-rw-r-- 1 skoumal users 43570 May 30 17:32 08-16E007N_4-PK
-rw-rw-r-- 1 skoumal users 45400 May 30 17:32 08-16X003N_4-PK
-rw-rw-r-- 1 skoumal users 44771 May 30 17:32 08-16X026N_1-PK
-rw-rw-r-- 1 skoumal users 44471 May 30 17:32 08-16X031N_2-PK
-rw-rw-r-- 1 skoumal users 40769 May 30 17:32 09-12A004N_1-PK
-rw-rw-r-- 1 skoumal users 44435 May 30 17:32 09-12A034N_3-PK
-rw-rw-r-- 1 skoumal users 45093 May 30 17:32 09-12H004N_1-PK
-rw-rw-r-- 1 skoumal users 40828 May 30 17:32 09-13A003N_0-PK
-rw-rw-r-- 1 skoumal users 48448 May 30 17:32 09-13A074N_4-PK
-rw-rw-r-- 1 skoumal users 42901 May 30 17:32 09-13A090N_2-PK
-rw-rw-r-- 1 skoumal users 43918 May 30 17:32 09-13B011N_0-PK
-rw-rw-r-- 1 skoumal users 46136 May 30 17:32 09-13B027N_0-PK
-rw-rw-r-- 1 skoumal users 43410 May 30 17:32 09-13O007N_3-PK
-rw-rw-r-- 1 skoumal users 44953 May 30 17:32 09-13P008N_1-PK
-rw-rw-r-- 1 skoumal users 44550 May 30 17:32 09-13T029N_3-PK
-rw-rw-r-- 1 skoumal users 50559 May 30 17:32 09-13X003N_0-PK
-rw-rw-r-- 1 skoumal users 42449 May 30 17:32 09-14A016N_0-PK
-rw-rw-r-- 1 skoumal users 39140 May 30 17:32 09-14C006N_3-PK
-rw-rw-r-- 1 skoumal users 36023 May 30 17:32 09-14T024N_4-PK
-rw-rw-r-- 1 skoumal users 47500 May 30 17:32 09-14X016N_2-PK
-rw-rw-r-- 1 skoumal users 46232 May 30 17:32 09-15O004N_2-PK
-rw-rw-r-- 1 skoumal users 49245 May 30 17:32 09-15X041N_0-PK
-rw-rw-r-- 1 skoumal users 41205 May 30 17:32 09-15X044N_1-PK
-rw-rw-r-- 1 skoumal users 45831 May 30 17:32 09-16A002N_1-PK
-rw-rw-r-- 1 skoumal users 42869 May 30 17:32 09-16E007N_1-PK
-rw-rw-r-- 1 skoumal users 43627 May 30 17:32 09-16X030N_0-PK
-rw-rw-r-- 1 skoumal users 44994 May 30 17:32 10-13A005N_4-PK
-rw-rw-r-- 1 skoumal users 41142 May 30 17:32 10-13A011N_3-PK
-rw-rw-r-- 1 skoumal users 41662 May 30 17:32 10-13A018N_2-PK
-rw-rw-r-- 1 skoumal users 47470 May 30 17:32 10-13A074N_5-PK
-rw-rw-r-- 1 skoumal users 39450 May 30 17:32 10-13B016N_1-PK
-rw-rw-r-- 1 skoumal users 44065 May 30 17:32 10-13O003N_0-PK
-rw-rw-r-- 1 skoumal users 43813 May 30 17:32 10-13P009N_0-PK
-rw-rw-r-- 1 skoumal users 44046 May 30 17:32 10-14A011N_1-PK
-rw-rw-r-- 1 skoumal users 46636 May 30 17:32 10-14C009N_2-PK
-rw-rw-r-- 1 skoumal users 48973 May 30 17:32 10-14O007N_0-PK
-rw-rw-r-- 1 skoumal users 40730 May 30 17:32 10-14P006N_1-PK
-rw-rw-r-- 1 skoumal users 49089 May 30 17:32 10-15O011N_0-PK
-rw-rw-r-- 1 skoumal users 43590 May 30 17:32 10-15O012N_0-PK
-rw-rw-r-- 1 skoumal users 41094 May 30 17:32 10-15P004N_0-PK
-rw-rw-r-- 1 skoumal users 39004 May 30 17:32 10-15T002N_2-PK
-rw-rw-r-- 1 skoumal users 45379 May 30 17:32 10-15T003N_1-PK
-rw-rw-r-- 1 skoumal users 46787 May 30 17:32 10-15T011N_4-PK
-rw-rw-r-- 1 skoumal users 49274 May 30 17:32 10-15X020N_0-PK
-rw-rw-r-- 1 skoumal users 43482 May 30 17:32 10-16A002N_0-PK
-rw-rw-r-- 1 skoumal users 43649 May 30 17:32 10-16E007N_5-PK
-rw-rw-r-- 1 skoumal users 46847 May 30 17:32 10-16P004N_3-PK
-rw-rw-r-- 1 skoumal users 44270 May 30 17:32 10-16X003N_0-PK
===== Slití ruční a automatické anotace =====
==== Příprava dat ====
* Pod adresářem ''[/net/grimm]/store/corp/Ortofon'' vytvoříme podadresář ''ortofon-merge'' a v něm ''davka-?/csts-import'' a ''davka-?/csts-merge''.
* V každém adresáři ''csts-merge'' si připravíme soubory pro slití.
* Z adresáře ''.../ortofon-automat/davka-?/csts-export'' zkopírujeme soubory a příponu převedeme na malá písmena. Při kopírování budeme rovnou vybírat unikátní tagy:
parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \
-s ../../ortofon-automat/davka-?/csts-export -t csts-merge -v
cd csts-merge
for ff in *-PK; do gg=${ff%-PK}-pk; echo "$ff $gg"; mv $ff $gg; done
Tohle provedeme pro každou příponu.
* Z adresáře ''../../ortofon-manual/davka-?/csts-export'' zkopírujeme odpovídající ručně zpracované soubory:
parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \
-s ../../ortofon-manual/davka-?/csts-export -t csts-merge -v
* Soubory s velkými písmeny mají v mark-upu '''' a košaté ''''; obojí mark-up musí obsahovat stejné tagy:
for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's///' $ff; done
for ff in *.bak; do echo ${ff%.bak}; sdiff -s ${ff%.bak} $ff; done
for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's:: \n:' $ff; done
for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's::\n:' $ff; done
for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:()\n :$1:' $ff; done
for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:\n(
):$1:' $ff; done
for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:\n\n::' $ff; done
for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:
\n::' $ff; done
* Musíme zkontrolovat, jestli jsou zarovnané:
for ff in *-[A-Z][A-Z]; do echo $ff; paste $ff ${ff%-??}-[a-z][a-z] | grep ""; done |\
grep -vP "\t" | l
a opravit podle originálních dat v ''.../Ortofon/ortofon-data/0?''.
* Automatické soubory nemají vid. Po zarovnání vytvoříme adresář ''csts-tag'' a zkopírujeme do něj soubory ''*-[A-Z][A-Z]''. mkdir ../csts-tag
cp -p *-[A-Z][A-Z] ../csts-tag
Na ně provedeme vidování a opravy vidů:
[frozen]
make-asp.sh -Eucs2 -fcsts -p6 -s csts-tag -t csts-tag-vid -v
cd /usr/local/corp/frozen-states/201910/corp/DisambiguacniSkripty/PostDisambVid-utf-csts/povinne
parallel-filter.sh -C "./11_OpravitVid-1 | ./11_OpravitVid-2 | ./20_asp_stat.pl" \
-s /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid \
-t /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid-corr -v
cd -
for ff in *; do echo $ff; perl -i -pe 's/invalid-/invalid/' $ff; done
* Soubory z ''csts-tag-vid-corr'' zkopírujeme zpátky do ''csts-merge''.
* Sjednotíme mark-up:
for ff in *; do echo $ff; perl -i.bak -pe 's/]+>//' $ff; done
for ff in *; do echo $ff; perl -i.bak -pe 's/(]+>/$1>/g' $ff; done
* Zkontrolujeme a opravíme obouvidá slovesa:
grep "B$" * | cut -f3 -d'<' | sort -u > ../vidy.txt
U dalších dávek porovnáme s předchozími
for ff in $(cat vidy.txt); do echo $ff; grep -h "$ff<" ../davka-1/csts-import/* | sort -u; done | l
A opravíme
for ff in $(grep -l "bydlet...............B" *); do echo $ff; \
perl -i.bak -pe\'s/(bydlet...............)B/$1I/' $ff; done
* Připravíme data pro import:
for ff in *-[A-Z][A-Z]; do gg=${ff%-??}-[a-z][a-z]; suff=$(echo $gg| cut -f3 -d'-'); \
echo $ff-$suff; paste $ff ${ff%-??}-[a-z][a-z] | perl -pe 's/[^<]+X@--------------([^<]+[FM])/$1/' > ../csts-import/$ff-$suff; done
a upravíme řádky s tagy ''F'', ''H'' a ''M'':
for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/@@Z:--------------/@/' $ff; done
for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/@@Z:--------------//' $ff; done
for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/emmX@--------------//' $ff; done
for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/hmmII--------------//' $ff; done
* Provedeme import (na jakobsonovi):
cd ../csts-import
for ff in *; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done
* Upravíme ''/usr/local/annotate/users''
* Určování vidů:
* **dát** P -- **dát se** I
* (**dokázat** P -- **dokázat** (umět) B)
* (**dovést** P -- **dovést** (umět) B)
* **hodit** P -- **hodit se** I
* **jmenovat** B -- **jmenovat se** I
* (**napovídat** P -- **napovídat** I)
* (**orientovat** B -- **orientovat se** I)
* **stát** I -- **stát se** P
* **věnovat** P -- **věnovat se** I
==== Anotátoři a přidělené soubory (davka-1) ====
=== Jan Henyš ===
* (6696):
-rw-r--r-- 1 skoumal users 74626 Nov 4 14:57 01-12A005N_1-VH-pk
-rw-r--r-- 1 skoumal users 81870 Nov 4 14:57 01-13A014N_1-VH-pk
-rw-r--r-- 1 skoumal users 93895 Nov 4 14:57 01-13A028N_1-VH-pk
-rw-r--r-- 1 skoumal users 127391 Nov 4 14:57 01-13B031N_1-VH-pk
-rw-r--r-- 1 skoumal users 108674 Nov 4 14:57 01-13H005N_1-VH-pk
-rw-r--r-- 1 skoumal users 139978 Nov 4 14:57 01-13O009N_3-VH-pk
-rw-r--r-- 1 skoumal users 81235 Nov 4 14:57 01-13P004N_1-VH-pk
-rw-r--r-- 1 skoumal users 106283 Nov 4 14:57 01-13P009N_2-VH-pk
-rw-r--r-- 1 skoumal users 104777 Nov 4 14:57 01-14T003N_6-VH-pk
-rw-r--r-- 1 skoumal users 114342 Nov 4 14:57 01-14T007N_1-VH-pk
-rw-r--r-- 1 skoumal users 121376 Nov 4 14:57 01-14T010N_0-VH-pk
-rw-r--r-- 1 skoumal users 122450 Nov 4 14:57 01-14X016N_6-VH-pk
-rw-r--r-- 1 skoumal users 126982 Nov 4 14:57 01-14X019N_0-VH-pk
-rw-r--r-- 1 skoumal users 130419 Nov 4 14:57 01-15O001N_0-VH-pk
-rw-r--r-- 1 skoumal users 105776 Nov 4 14:57 01-15O004N_0-TM-pk
-rw-r--r-- 1 skoumal users 111398 Nov 4 14:57 01-15O007N_1-TM-pk
-rw-r--r-- 1 skoumal users 115392 Nov 4 14:57 01-15P001N_1-TM-pk
-rw-r--r-- 1 skoumal users 154103 Nov 4 14:57 01-15T005N_0-TM-pk
-rw-r--r-- 1 skoumal users 107637 Nov 4 14:57 01-15X045N_2-TM-pk
-rw-r--r-- 1 skoumal users 125936 Nov 4 14:57 01-16A005N_2-TM-pk
-rw-r--r-- 1 skoumal users 98450 Nov 4 14:57 01-16P002N_2-TM-pk
-rw-r--r-- 1 skoumal users 104393 Nov 4 14:57 01-16X033N_5-TM-pk
-rw-r--r-- 1 skoumal users 101802 Nov 4 14:57 02-12A011N_1-TM-pk
-rw-r--r-- 1 skoumal users 122296 Nov 4 14:57 02-13A011N_5-TM-pk
-rw-r--r-- 1 skoumal users 102029 Nov 4 14:57 02-13A036N_3-TM-pk
-rw-r--r-- 1 skoumal users 149647 Nov 4 14:57 02-13B031N_0-TM-pk
-rw-r--r-- 1 skoumal users 88170 Nov 4 14:57 02-13O009N_1-TM-pk
-rw-r--r-- 1 skoumal users 97377 Nov 4 14:57 02-13O010N_2-TM-pk
-rw-r--r-- 1 skoumal users 126451 Nov 4 14:57 02-13O013N_1-MZ-pk
-rw-r--r-- 1 skoumal users 79652 Nov 4 14:57 02-13P004N_3-MZ-pk
-rw-r--r-- 1 skoumal users 98536 Nov 4 14:57 02-13T029N_6-MZ-pk
-rw-r--r-- 1 skoumal users 88034 Nov 4 14:57 02-14A016N_2-MZ-pk
-rw-r--r-- 1 skoumal users 118994 Nov 4 14:57 02-14E003N_0-MZ-pk
-rw-r--r-- 1 skoumal users 123841 Nov 4 14:57 02-14T003N_5-MZ-pk
-rw-r--r-- 1 skoumal users 115630 Nov 4 14:57 02-14T013N_2-MZ-pk
-rw-r--r-- 1 skoumal users 134818 Nov 4 14:57 02-14T020N_3-MZ-pk
-rw-r--r-- 1 skoumal users 141686 Nov 4 14:57 02-15O002N_0-MZ-pk
-rw-r--r-- 1 skoumal users 95958 Nov 4 14:57 02-15O004N_1-MZ-pk
-rw-r--r-- 1 skoumal users 84948 Nov 4 14:57 02-15O009N_1-MZ-pk
-rw-r--r-- 1 skoumal users 101862 Nov 4 14:57 02-15P001N_3-MZ-pk
-rw-r--r-- 1 skoumal users 100946 Nov 4 14:57 02-15X041N_1-MZ-pk
-rw-r--r-- 1 skoumal users 115018 Nov 4 14:57 02-16A001N_0-MZ-pk
-rw-r--r-- 1 skoumal users 132001 Nov 4 14:57 02-16A005N_5-SK-pk
-rw-r--r-- 1 skoumal users 99541 Nov 4 14:57 02-16P002N_0-SK-pk
-rw-r--r-- 1 skoumal users 79580 Nov 4 14:57 02-16X003N_5-SK-pk
-rw-r--r-- 1 skoumal users 109874 Nov 4 14:57 03-12A035N_3-SK-pk
-rw-r--r-- 1 skoumal users 92330 Nov 4 14:57 03-13A014N_4-SK-pk
-rw-r--r-- 1 skoumal users 101868 Nov 4 14:57 03-13O009N_2-SK-pk
-rw-r--r-- 1 skoumal users 102143 Nov 4 14:57 03-13P010N_1-SK-pk
-rw-r--r-- 1 skoumal users 119923 Nov 4 14:57 03-14A011N_3-SK-pk
-rw-r--r-- 1 skoumal users 89549 Nov 4 14:57 03-14A016N_4-SK-pk
-rw-r--r-- 1 skoumal users 102824 Nov 4 14:57 03-14P007N_2-SK-pk
-rw-r--r-- 1 skoumal users 152089 Nov 4 14:57 03-14T010N_2-SK-pk
-rw-r--r-- 1 skoumal users 127088 Nov 4 14:57 03-14T013N_0-SK-pk
-rw-r--r-- 1 skoumal users 131223 Nov 4 14:57 03-14T020N_0-SK-pk
-rw-r--r-- 1 skoumal users 125088 Nov 4 14:57 03-14X019N_4-SK-pk
-rw-r--r-- 1 skoumal users 106008 Nov 4 14:57 03-14X021N_2-LK-pk
=== Václav Horký ===
* (7020) **hotovo**:
-rw-r--r-- 1 skoumal users 156567 Nov 4 14:57 03-15E003N_0-LK-pk
-rw-r--r-- 1 skoumal users 112165 Nov 4 14:57 03-15E015N_1-LK-pk
-rw-r--r-- 1 skoumal users 100938 Nov 4 14:57 03-15O002N_1-LK-pk
-rw-r--r-- 1 skoumal users 122762 Nov 4 14:57 03-15O007N_0-LK-pk
-rw-r--r-- 1 skoumal users 143197 Nov 4 14:57 03-15X020N_1-LK-pk
-rw-r--r-- 1 skoumal users 115593 Nov 4 14:57 03-15X041N_2-LK-pk
-rw-r--r-- 1 skoumal users 135237 Nov 4 14:57 03-16A005N_4-LK-pk
-rw-r--r-- 1 skoumal users 97000 Nov 4 14:57 03-16P004N_0-LK-pk
-rw-r--r-- 1 skoumal users 145176 Nov 4 14:57 03-16X001N_1-LK-pk
-rw-r--r-- 1 skoumal users 118990 Nov 4 14:57 03-16X031N_4-LK-pk
-rw-r--r-- 1 skoumal users 83711 Nov 4 14:57 04-12A025N_0-LK-pk
-rw-r--r-- 1 skoumal users 140071 Nov 4 14:57 04-12P004N_4-LK-pk
-rw-r--r-- 1 skoumal users 152345 Nov 4 14:57 04-13B009N_0-LK-pk
-rw-r--r-- 1 skoumal users 126964 Nov 4 14:57 04-13B019N_0-MH-pk
-rw-r--r-- 1 skoumal users 116116 Nov 4 14:57 04-13B025N_0-MH-pk
-rw-r--r-- 1 skoumal users 102177 Nov 4 14:57 04-13O007N_1-MH-pk
-rw-r--r-- 1 skoumal users 101580 Nov 4 14:57 04-13O010N_0-MH-pk
-rw-r--r-- 1 skoumal users 139029 Nov 4 14:57 04-13O013N_0-MH-pk
-rw-r--r-- 1 skoumal users 83185 Nov 4 14:57 04-13P004N_2-MH-pk
-rw-r--r-- 1 skoumal users 101357 Nov 4 14:57 04-14P007N_0-MH-pk
-rw-r--r-- 1 skoumal users 136504 Nov 4 14:57 04-14T003N_0-MH-pk
-rw-r--r-- 1 skoumal users 125338 Nov 4 14:57 04-14T010N_1-MH-pk
-rw-r--r-- 1 skoumal users 105675 Nov 4 14:57 04-14T013N_1-MH-pk
-rw-r--r-- 1 skoumal users 106963 Nov 4 14:57 04-14X016N_3-MH-pk
-rw-r--r-- 1 skoumal users 92255 Nov 4 14:57 04-15O010N_0-MH-pk
-rw-r--r-- 1 skoumal users 111061 Nov 4 14:57 04-15P001N_2-MH-pk
-rw-r--r-- 1 skoumal users 90023 Nov 4 14:57 04-15P006N_0-MH-pk
-rw-r--r-- 1 skoumal users 102256 Nov 4 14:57 04-15X030N_3-PK-mh
-rw-r--r-- 1 skoumal users 96979 Nov 4 14:57 04-15X043N_2-PK-mh
-rw-r--r-- 1 skoumal users 125828 Nov 4 14:57 04-16A009N_2-PK-mh
-rw-r--r-- 1 skoumal users 85615 Nov 4 14:57 04-16E007N_2-PK-mh
-rw-r--r-- 1 skoumal users 113460 Nov 4 14:57 04-16P004N_1-PK-mh
-rw-r--r-- 1 skoumal users 103271 Nov 4 14:57 04-16X003N_2-PK-mh
-rw-r--r-- 1 skoumal users 114602 Nov 4 14:57 05-12P004N_2-PK-mh
-rw-r--r-- 1 skoumal users 122736 Nov 4 14:57 05-13A011N_0-PK-mh
-rw-r--r-- 1 skoumal users 96925 Nov 4 14:57 05-13A014N_3-PK-mh
-rw-r--r-- 1 skoumal users 80363 Nov 4 14:57 05-13A023N_3-PK-mh
-rw-r--r-- 1 skoumal users 108453 Nov 4 14:57 05-13B005N_1-PK-mh
-rw-r--r-- 1 skoumal users 139140 Nov 4 14:57 05-13D015N_0-PK-mh
-rw-r--r-- 1 skoumal users 104929 Nov 4 14:57 05-13O007N_0-PK-mh
-rw-r--r-- 1 skoumal users 125205 Nov 4 14:57 05-14A011N_2-PK-mh
-rw-r--r-- 1 skoumal users 147679 Nov 4 14:57 05-14T010N_3-AN-pk
-rw-r--r-- 1 skoumal users 90556 Nov 4 14:57 05-14T019N_0-AN-pk
-rw-r--r-- 1 skoumal users 117725 Nov 4 14:57 05-14X012N_2-AN-pk
-rw-r--r-- 1 skoumal users 146139 Nov 4 14:57 05-14X019N_2-AN-pk
-rw-r--r-- 1 skoumal users 148811 Nov 4 14:57 05-14X019N_3-AN-pk
-rw-r--r-- 1 skoumal users 121542 Nov 4 14:57 05-15O001N_1-AN-pk
-rw-r--r-- 1 skoumal users 117876 Nov 4 14:57 05-15P001N_0-AN-pk
-rw-r--r-- 1 skoumal users 149222 Nov 4 14:57 05-15X009N_1-AN-pk
-rw-r--r-- 1 skoumal users 141576 Nov 4 14:57 05-15X020N_3-AN-pk
-rw-r--r-- 1 skoumal users 69482 Nov 4 14:57 05-15X043N_5-AN-pk
-rw-r--r-- 1 skoumal users 149294 Nov 4 14:57 05-16A005N_1-AN-pk
-rw-r--r-- 1 skoumal users 85369 Nov 4 14:57 05-16E007N_0-AN-pk
-rw-r--r-- 1 skoumal users 95205 Nov 4 14:57 05-16P002N_1-AN-pk
-rw-r--r-- 1 skoumal users 95081 Nov 4 14:57 05-16P007N_1-AN-pk
-rw-r--r-- 1 skoumal users 133102 Nov 4 14:57 05-16X001N_2-MH-pk
==== Anotátoři a přidělené soubory (davka-2) ====
=== Jan Henyš ===
* ():
=== Václav Horký ===
* (4675) **hotovo**:
-rw-r--r-- 1 skoumal staff 151492 Dec 5 15:12 08-16A005N_3-AN-pk
-rw-r--r-- 1 skoumal staff 100204 Dec 5 15:12 08-16E005N_4-AN-pk
-rw-r--r-- 1 skoumal staff 93187 Dec 5 15:12 08-16E007N_4-AN-pk
-rw-r--r-- 1 skoumal staff 109275 Dec 5 15:12 08-16X003N_4-AN-pk
-rw-r--r-- 1 skoumal staff 131994 Dec 5 15:12 08-16X026N_1-AN-pk
-rw-r--r-- 1 skoumal staff 111495 Dec 5 15:12 08-16X031N_2-AN-pk
-rw-r--r-- 1 skoumal staff 92992 Dec 5 15:12 09-12A004N_1-PK-pk
-rw-r--r-- 1 skoumal staff 115075 Dec 5 15:12 09-12A034N_3-PK-pk
-rw-r--r-- 1 skoumal staff 109999 Dec 5 15:12 09-12H004N_1-PK-pk
-rw-r--r-- 1 skoumal staff 91469 Dec 5 15:12 09-13A003N_0-PK-pk
-rw-r--r-- 1 skoumal staff 112768 Dec 5 15:12 09-13A074N_4-PK-pk
-rw-r--r-- 1 skoumal staff 112259 Dec 5 15:12 09-13A090N_2-PK-pk
-rw-r--r-- 1 skoumal staff 87195 Dec 5 15:12 09-13B011N_0-PK-pk
-rw-r--r-- 1 skoumal staff 129299 Dec 5 15:12 09-13B027N_0-PK-pk
-rw-r--r-- 1 skoumal staff 122598 Dec 5 15:12 09-13O007N_3-PK-pk
-rw-r--r-- 1 skoumal staff 110178 Dec 5 15:12 09-13P008N_1-PK-pk
-rw-r--r-- 1 skoumal staff 120714 Dec 5 15:12 09-13T029N_3-PK-pk
-rw-r--r-- 1 skoumal staff 185624 Dec 5 15:12 09-13X003N_0-PK-pk
-rw-r--r-- 1 skoumal staff 90067 Dec 5 15:12 09-14A016N_0-PK-pk
-rw-r--r-- 1 skoumal staff 103615 Dec 5 15:12 09-14C006N_3-PK-pk
-rw-r--r-- 1 skoumal staff 73062 Dec 5 15:12 09-14T024N_4-PK-pk
-rw-r--r-- 1 skoumal staff 128943 Dec 5 15:12 09-14X016N_2-PK-pk
-rw-r--r-- 1 skoumal staff 108095 Dec 5 15:12 09-15O004N_2-PK-pk
-rw-r--r-- 1 skoumal staff 126265 Dec 5 15:12 09-15X041N_0-PK-pk
-rw-r--r-- 1 skoumal staff 119361 Dec 5 15:12 09-15X044N_1-PK-pk
-rw-r--r-- 1 skoumal staff 142155 Dec 5 15:12 09-16A002N_1-PK-pk
-rw-r--r-- 1 skoumal staff 75463 Dec 5 15:12 09-16E007N_1-PK-pk
-rw-r--r-- 1 skoumal staff 120311 Dec 5 15:12 09-16X030N_0-PK-pk
-rw-r--r-- 1 skoumal staff 135473 Dec 5 15:12 10-13A005N_4-MH-pk
-rw-r--r-- 1 skoumal staff 113915 Dec 5 15:12 10-13A011N_3-MH-pk
-rw-r--r-- 1 skoumal staff 87897 Dec 5 15:12 10-13A018N_2-MH-pk
-rw-r--r-- 1 skoumal staff 121284 Dec 5 15:12 10-13A074N_5-MH-pk
-rw-r--r-- 1 skoumal staff 103486 Dec 5 15:12 10-13B016N_1-MH-pk
-rw-r--r-- 1 skoumal staff 127402 Dec 5 15:12 10-13O003N_0-MH-pk
-rw-r--r-- 1 skoumal staff 99716 Dec 5 15:12 10-13P009N_0-MH-pk
-rw-r--r-- 1 skoumal staff 126373 Dec 5 15:12 10-14A011N_1-MH-pk
-rw-r--r-- 1 skoumal staff 132420 Dec 5 15:12 10-14C009N_2-MH-pk
-rw-r--r-- 1 skoumal staff 154287 Dec 5 15:12 10-14O007N_0-MH-pk
-rw-r--r-- 1 skoumal staff 108656 Dec 5 15:12 10-14P006N_1-MH-pk
-rw-r--r-- 1 skoumal staff 111987 Dec 5 15:12 10-15O011N_0-MH-pk
-rw-r--r-- 1 skoumal staff 98207 Dec 5 15:12 10-15O012N_0-MH-pk
-rw-r--r-- 1 skoumal staff 139761 Dec 5 15:12 10-15P004N_0-MH-pk
-rw-r--r-- 1 skoumal staff 94261 Dec 5 15:12 10-15T002N_2-MH-pk
-rw-r--r-- 1 skoumal staff 179656 Dec 5 15:12 10-15T003N_1-MH-pk
-rw-r--r-- 1 skoumal staff 158985 Dec 5 15:12 10-15T011N_4-MH-pk
-rw-r--r-- 1 skoumal staff 147464 Dec 5 15:12 10-15X020N_0-MH-pk
-rw-r--r-- 1 skoumal staff 133827 Dec 5 15:12 10-16A002N_0-MH-pk
-rw-r--r-- 1 skoumal staff 87534 Dec 5 15:12 10-16E007N_5-MH-pk
-rw-r--r-- 1 skoumal staff 96142 Dec 5 15:12 10-16P004N_3-MH-pk
-rw-r--r-- 1 skoumal staff 103483 Dec 5 15:12 10-16X003N_0-MH-pk
==== Výroba vertikály s mark-upem ====
* Ručně anotované soubory jsou v adresáři ''csts-export''
* Do vertikály je převedeme skriptem ''ortofon-csts-vert.pl'':
parallel-filter.sh -C "ortofon-csts-vert.pl" -p45 -s csts-export -t vert-export -v
* Pro jistotu zkopírujeme vše do ''vert-opravy'' a opravy provádíme tam.
==== Kontrola a ruční opravy vertikály ====
=== Automatické opravy ===
* Forma ''von.*'' vs lemma ''on.*'':grep -P "von.*\ton" *
* Varianta ''6'' u lemmat ''von.*'': grep -P "von[^\t]*\tPP.*6" *
* Vid u příklonky ''s'' (forma //#s//)
* Sjednotit ''každý''
* Zkratky
* Hesitační zvuky (//hmm//, //@//, //@@//)
=== Ruční opravy ===
* ''invalid''
* ''X@''
* Vizuální kontrola tagů:grep -h -v "^<" * | cut -f3 | sort -u | l
* Kontrola správnosti tagů:
grep -h -v "^<" * | cut -f3 | sort -u | check-tag.pl -l16 > /dev/null
* Kontrola //hvězdiček// apod.
===== Porovnání lemmat a POS od nás vs. MorphoDita (pro studentku Dominiku) =====
* Vytvoříme adresář ''merge-csts'', kde budeme připravovat texty pro anotaci.
==== Převod chunků do csts ====
* Je třeba z vertikály udělat ''
cd chunks
for ff in *; do echo $ff; oral-vert-csts.pl < $ff > ../merge-csts/${ff%.vrt}.chunk.csts; done
==== Porovnání našich pravidel s chunky ====
* Provedeme pomocí diffu:
sdiff 05-16X001N_2.chunk.csts <(grep -v '' ../csts-import/05-16X001N_2.vrt | perl -pe 's/(.)[^<\n]+/$1/g' \
| remove-dupl-csts-mark.pl | perl -pe 's/]*>//') | l
==== Převod csts-rules-frazrl do společného formátu ====
* Převedeme takto:cd csts-import
for ff in *.vrt; do echo $ff; grep -v '' $ff | perl -pe 's/(.)[^<\n]+/$1/g' \
| perl -pe 's/&dhellip;/../g' | perl -pe 's/&thellip;/.../g' | perl -pe 's/(\*)X/$1F/' \
| perl -pe 's/\@+()Z/\@$1H/' | perl -pe 's/([eh]mm)[IX]/$1H/' | perl -pe 's/]+>([eh]mm<|\@+)/$1/' \
| perl -pe 's/(\))X/$1M/' | perl -pe 's/(\&)Z/$1H/' | remove-dupl-csts-mark.pl | perl -pe 's/]*>//' \
> ../merge-csts/${ff%.vrt}.import.csts; done
==== Slití chunk a import do merge-import ====
* Vyrobíme data pro anotaci:mkdir -p merge-import
cd merge-csts
for ff in *.chunk.csts; do echo $ff; sdiff -w 2500 ${ff%.chunk.csts}.import.csts $ff \
| perl -pe 's/[\ \t]+\|[\ \t]+[^<]+//' | perl -pe 's/[\ \t]+<.*//' | remove-dupl-csts-mark.pl Q \
> ../merge-import/${ff%.chunk.csts}.csts; done
* zkontrolujeme tabulátory
* a potom naimportujeme do anotačního programu (na jakobsonovi):
cd ../merge-import
for ff in *-Dom; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done