Wiki spuštěna 24. 7. 2025

Tagování textů pro Ortofon

  • Texty dostáváme s mark-upem
    <chunk id="12A005N_1" year="2012" month="8" location="Praha" situation="hovor doma" speakers="2" genders="smíšené" generations="1" relationship="partnerský" length="10:20" tokens="1567" doc_id="12A005N" position_in_text="end">
    <sp id="79f" nickname="Lenka_24" gender="Z: žena" confederate="ano" age_binary="I: do 35 let" age="33" edu_binary="A: vysokoškolské" edu_level="VŠ" edu_field="Pedagogika, učitelství a sociální péče" occupation="lektor AJ" occupation_category="23 pedagog" reg_childhood="středočeská" loc_childhood="Praha" locsize_childhood="město nad 100 tisíc" reg_longest="středočeská" locsize_longest="město nad 100 tisíc" reg_current="středočeská" locsize_current="město nad 100 tisíc" proportion="55 %" soundfile="4/4/18692a8d.mp3">
    tak
    taky
    můžeš
    jít
    na
    ryby
    </sp>

Postup práce

  • Každý text je anotován jedním lidským anotátorem a slitými výsledky hybridu vs MorphoDiTa.
  • Máme deset adresářů s texty od Davida: /store/corp/Ortofon/ortofon-data/01–05 (první dávka) a /store/corp/Ortofon/ortofon-data/06–10 (druhá dávka).
  • Ruční anotace se provádí v adresářích /store/corp/Ortofon/ortofon-manual/davka-?.
  • Automatická anotace se provádí v adresářích /store/corp/Ortofon/ortofon-hybrid/davka-? a /store/corp/Ortofon/ortofon-morphodita/davka?.
  • Závěrečné slití ruční a automatické anotace se provede v adresáři /store/corp/Ortofon/ortofon-etalon/davka-?.

Ruční anotace

Příprava textů pro anotátory

  • Všechny soubory dáme do adresáře chunks
  • Vyrobíme csts:
    parallel-filter.sh -C "cut -f1 | perl -pe 's/\.\.\./&thellip;/g' \
    | perl -pe 's/\.\./&dhellip;/g' | replace_spaces.pl \
    | perl -pe 's/<sp /<s /' | perl -pe 's:</sp>:</s>:' \
    | vert_csts.pl | perl -pe 'undef $/; s/<s>\n<chunk/<chunk/'" -p45 -s chunks -t csts -v
  • Tagujeme hybridem!
  • Provedeme morfologii:
    make-corp.sh -s csts -t csts-morf -Eucs2 -A1 -B1 -M -p45 -v
  • Rozhodneme vole a von:
    cd csts-morf
    for ff in *; do echo $ff; \
    perl -i -pe 's/<f>vole<.*/<f src="M">vole<MMl>vůl<MMt>NNMS5-----A----/' $ff; done
    for ff in *; do echo $ff; \
    perl -i -pe 's/(<f[^>]*>von)<.*/$1<MMl>on<MMt>PPYS1--3------6/' $ff; done
  • Provedeme pravidla a frazémy:
    make-whole-corp-csts.sh -Eucs2 -M -v -p45 -trules -Tfrazrl
  • Upravíme tagy:
    parallel-filter.sh -C "normalize-anot-csts.pl \
    | simplify-tags-csts-utf.pl| remove-dupl-csts-mark.pl X" -p45 \
    -s csts-rules-frazrl -t csts-import -v

    A ještě zjednodušit tagy a ošetřit zvuky v pozadí.

  • Naděláme linky pro jednotlivé anotátory.
  • Další kroky provedeme na jakobsonovi.
  • Import souboru:
    /usr/local/annotate/bin/csts-import-utkl.pl --force 05-16X001N_2-HS
  • Upravit /usr/local/annotate/users.

Anotátoři a přidělené soubory (davka-1)

Michal Havrda - MH

  • xaf (4541): hotovo
    -rw-rw-r-- 1 skoumal users 169887 Nov  7 17:40 04-13B019N_0-MH  *
    -rw-rw-r-- 1 skoumal users 147849 Nov  7 17:40 04-13B025N_0-MH  *
    -rw-rw-r-- 1 skoumal users 136246 Nov  7 17:40 04-13O007N_1-MH  *
    -rw-rw-r-- 1 skoumal users 133560 Nov  7 17:40 04-13O010N_0-MH  *
    -rw-rw-r-- 1 skoumal users 165228 Nov  7 17:40 04-13O013N_0-MH  *
    -rw-rw-r-- 1 skoumal users 119827 Nov  7 17:40 04-13P004N_2-MH  **
    -rw-rw-r-- 1 skoumal users 129590 Nov  7 17:40 04-14P007N_0-MH  *
    -rw-rw-r-- 1 skoumal users 166905 Nov  7 17:40 04-14T003N_0-MH  *
    -rw-rw-r-- 1 skoumal users 152270 Nov  7 17:40 04-14T010N_1-MH  *
    -rw-rw-r-- 1 skoumal users 141300 Nov  7 17:40 04-14T013N_1-MH  *
    -rw-rw-r-- 1 skoumal users 148744 Nov  7 17:40 04-14X016N_3-MH  *
    -rw-rw-r-- 1 skoumal users 131885 Nov  7 17:40 04-15O010N_0-MH  *
    -rw-rw-r-- 1 skoumal users 153940 Nov  7 17:40 04-15P001N_2-MH  *
    -rw-rw-r-- 1 skoumal users 119987 Nov  7 17:40 04-15P006N_0-MH  **
    -rw-rw-r-- 1 skoumal users 163310 Nov  7 17:40 05-16X001N_2-MH  **

Václav Horký - VH

  • xaa (4238):
    -rw-rw-r-- 1 skoumal users  97843 Nov  7 17:40 01-12A005N_1-VH
    -rw-rw-r-- 1 skoumal users 124148 Nov  7 17:40 01-13A014N_1-VH
    -rw-rw-r-- 1 skoumal users 139676 Nov  7 17:40 01-13A028N_1-VH
    -rw-rw-r-- 1 skoumal users 162599 Nov  7 17:40 01-13B031N_1-VH
    -rw-rw-r-- 1 skoumal users 128254 Nov  7 17:40 01-13H005N_1-VH
    -rw-rw-r-- 1 skoumal users 176088 Nov  7 17:40 01-13O009N_3-VH
    -rw-rw-r-- 1 skoumal users 128760 Nov  7 17:40 01-13P004N_1-VH
    -rw-rw-r-- 1 skoumal users 139368 Nov  7 17:40 01-13P009N_2-VH
    -rw-rw-r-- 1 skoumal users 135085 Nov  7 17:40 01-14T003N_6-VH
    -rw-rw-r-- 1 skoumal users 152015 Nov  7 17:40 01-14T007N_1-VH
    -rw-rw-r-- 1 skoumal users 153372 Nov  7 17:40 01-14T010N_0-VH
    -rw-rw-r-- 1 skoumal users 166194 Nov  7 17:40 01-14X016N_6-VH
    -rw-rw-r-- 1 skoumal users 168697 Nov  7 17:40 01-14X019N_0-VH
    -rw-rw-r-- 1 skoumal users 160336 Nov  7 17:40 01-15O001N_0-VH

Šárka Kadavá - SK

  • xad (4521): hotovo
    -rw-rw-r-- 1 skoumal users 173330 Nov  7 17:40 02-16A005N_5-SK
    -rw-rw-r-- 1 skoumal users 149878 Nov  7 17:40 02-16P002N_0-SK
    -rw-rw-r-- 1 skoumal users 118822 Nov  7 17:40 02-16X003N_5-SK
    -rw-rw-r-- 1 skoumal users 138030 Nov  7 17:40 03-12A035N_3-SK
    -rw-rw-r-- 1 skoumal users 130944 Nov  7 17:40 03-13A014N_4-SK
    -rw-rw-r-- 1 skoumal users 138479 Nov  7 17:40 03-13O009N_2-SK
    -rw-rw-r-- 1 skoumal users 132373 Nov  7 17:40 03-13P010N_1-SK
    -rw-rw-r-- 1 skoumal users 159964 Nov  7 17:40 03-14A011N_3-SK
    -rw-rw-r-- 1 skoumal users 122871 Nov  7 17:40 03-14A016N_4-SK
    -rw-rw-r-- 1 skoumal users 137852 Nov  7 17:40 03-14P007N_2-SK
    -rw-rw-r-- 1 skoumal users 186207 Nov  7 17:40 03-14T010N_2-SK
    -rw-rw-r-- 1 skoumal users 151313 Nov  7 17:40 03-14T013N_0-SK
    -rw-rw-r-- 1 skoumal users 162548 Nov  7 17:40 03-14T020N_0-SK
    -rw-rw-r-- 1 skoumal users 172522 Nov  7 17:40 03-14X019N_4-SK

Pavel Kopřiva - PK

  • xag (4601): hotovo
    -rw-rw-r-- 1 skoumal users 128575 Nov  7 17:40 04-15X030N_3-PK
    -rw-rw-r-- 1 skoumal users 152738 Nov  7 17:40 04-15X043N_2-PK
    -rw-rw-r-- 1 skoumal users 162024 Nov  7 17:40 04-16A009N_2-PK
    -rw-rw-r-- 1 skoumal users 114272 Nov  7 17:40 04-16E007N_2-PK
    -rw-rw-r-- 1 skoumal users 161129 Nov  7 17:40 04-16P004N_1-PK
    -rw-rw-r-- 1 skoumal users 145303 Nov  7 17:40 04-16X003N_2-PK
    -rw-rw-r-- 1 skoumal users 153704 Nov  7 17:40 05-12P004N_2-PK
    -rw-rw-r-- 1 skoumal users 141893 Nov  7 17:40 05-13A011N_0-PK
    -rw-rw-r-- 1 skoumal users 136846 Nov  7 17:40 05-13A014N_3-PK
    -rw-rw-r-- 1 skoumal users 117562 Nov  7 17:40 05-13A023N_3-PK
    -rw-rw-r-- 1 skoumal users 145722 Nov  7 17:40 05-13B005N_1-PK
    -rw-rw-r-- 1 skoumal users 169153 Nov  7 17:40 05-13D015N_0-PK
    -rw-rw-r-- 1 skoumal users 141199 Nov  7 17:40 05-13O007N_0-PK
    -rw-rw-r-- 1 skoumal users 172005 Nov  7 17:40 05-14A011N_2-PK

Lucie Onari Kreslová - LK

  • xae (4451): hotovo
    -rw-rw-r-- 1 skoumal users 136241 Nov  7 17:40 03-14X021N_2-LK
    -rw-rw-r-- 1 skoumal users 180763 Nov  7 17:40 03-15E003N_0-LK
    -rw-rw-r-- 1 skoumal users 144355 Nov  7 17:40 03-15E015N_1-LK
    -rw-rw-r-- 1 skoumal users 135982 Nov  7 17:40 03-15O002N_1-LK
    -rw-rw-r-- 1 skoumal users 171807 Nov  7 17:40 03-15O007N_0-LK
    -rw-rw-r-- 1 skoumal users 175211 Nov  7 17:40 03-15X020N_1-LK
    -rw-rw-r-- 1 skoumal users 158941 Nov  7 17:40 03-15X041N_2-LK
    -rw-rw-r-- 1 skoumal users 170958 Nov  7 17:40 03-16A005N_4-LK
    -rw-rw-r-- 1 skoumal users 140487 Nov  7 17:40 03-16P004N_0-LK
    -rw-rw-r-- 1 skoumal users 171624 Nov  7 17:40 03-16X001N_1-LK
    -rw-rw-r-- 1 skoumal users 148677 Nov  7 17:40 03-16X031N_4-LK
    -rw-rw-r-- 1 skoumal users 115839 Nov  7 17:40 04-12A025N_0-LK
    -rw-rw-r-- 1 skoumal users 172208 Nov  7 17:40 04-12P004N_4-LK
    -rw-rw-r-- 1 skoumal users 200543 Nov  7 17:40 04-13B009N_0-LK

Tereza Marková - TM

  • xab (4269): - hotovo
    -rw-rw-r-- 1 skoumal users 144719 Nov  7 17:40 01-15O004N_0-TM  *
    -rw-rw-r-- 1 skoumal users 147429 Nov  7 17:40 01-15O007N_1-TM  *
    -rw-rw-r-- 1 skoumal users 156462 Nov  7 17:40 01-15P001N_1-TM  **
    -rw-rw-r-- 1 skoumal users 188348 Nov  7 17:40 01-15T005N_0-TM  **
    -rw-rw-r-- 1 skoumal users 149779 Nov  7 17:40 01-15X045N_2-TM  *
    -rw-rw-r-- 1 skoumal users 165698 Nov  7 17:40 01-16A005N_2-TM  *
    -rw-rw-r-- 1 skoumal users 132634 Nov  7 17:40 01-16P002N_2-TM  *
    -rw-rw-r-- 1 skoumal users 121098 Nov  7 17:40 01-16X033N_5-TM  *
    -rw-rw-r-- 1 skoumal users 122287 Nov  7 17:40 02-12A011N_1-TM  *
    -rw-rw-r-- 1 skoumal users 146221 Nov  7 17:40 02-13A011N_5-TM  *
    -rw-rw-r-- 1 skoumal users 136882 Nov  7 17:40 02-13A036N_3-TM  **
    -rw-rw-r-- 1 skoumal users 198356 Nov  7 17:40 02-13B031N_0-TM  *
    -rw-rw-r-- 1 skoumal users 112397 Nov  7 17:40 02-13O009N_1-TM  **
    -rw-rw-r-- 1 skoumal users 126312 Nov  7 17:40 02-13O010N_2-TM  **

Anna Nováková - AN

  • xah (4552):
    -rw-rw-r-- 1 skoumal users 190480 Nov  7 17:40 05-14T010N_3-AN
    -rw-rw-r-- 1 skoumal users 122224 Nov  7 17:40 05-14T019N_0-AN
    -rw-rw-r-- 1 skoumal users 154255 Nov  7 17:40 05-14X012N_2-AN
    -rw-rw-r-- 1 skoumal users 189324 Nov  7 17:40 05-14X019N_2-AN
    -rw-rw-r-- 1 skoumal users 183678 Nov  7 17:40 05-14X019N_3-AN
    -rw-rw-r-- 1 skoumal users 154400 Nov  7 17:40 05-15O001N_1-AN
    -rw-rw-r-- 1 skoumal users 150413 Nov  7 17:40 05-15P001N_0-AN
    -rw-rw-r-- 1 skoumal users 206423 Nov  7 17:40 05-15X009N_1-AN
    -rw-rw-r-- 1 skoumal users 181125 Nov  7 17:40 05-15X020N_3-AN
    -rw-rw-r-- 1 skoumal users  99237 Nov  7 17:40 05-15X043N_5-AN
    -rw-rw-r-- 1 skoumal users 182076 Nov  7 17:40 05-16A005N_1-AN
    -rw-rw-r-- 1 skoumal users 122299 Nov  7 17:40 05-16E007N_0-AN
    -rw-rw-r-- 1 skoumal users 120653 Nov  7 17:40 05-16P002N_1-AN
    -rw-rw-r-- 1 skoumal users 131582 Nov  7 17:40 05-16P007N_1-AN

Michal Zlatkovský - MZ

  • xac (4393):
    -rw-rw-r-- 1 skoumal users 160624 Nov  7 17:40 02-13O013N_1-MZ
    -rw-rw-r-- 1 skoumal users 119673 Nov  7 17:40 02-13P004N_3-MZ
    -rw-rw-r-- 1 skoumal users 120363 Nov  7 17:40 02-13T029N_6-MZ
    -rw-rw-r-- 1 skoumal users 117594 Nov  7 17:40 02-14A016N_2-MZ
    -rw-rw-r-- 1 skoumal users 156603 Nov  7 17:40 02-14E003N_0-MZ
    -rw-rw-r-- 1 skoumal users 153488 Nov  7 17:40 02-14T003N_5-MZ
    -rw-rw-r-- 1 skoumal users 147516 Nov  7 17:40 02-14T013N_2-MZ
    -rw-rw-r-- 1 skoumal users 178036 Nov  7 17:40 02-14T020N_3-MZ
    -rw-rw-r-- 1 skoumal users 172908 Nov  7 17:40 02-15O002N_0-MZ
    -rw-rw-r-- 1 skoumal users 122828 Nov  7 17:40 02-15O004N_1-MZ
    -rw-rw-r-- 1 skoumal users 127983 Nov  7 17:40 02-15O009N_1-MZ
    -rw-rw-r-- 1 skoumal users 129864 Nov  7 17:40 02-15P001N_3-MZ
    -rw-rw-r-- 1 skoumal users 129648 Nov  7 17:40 02-15X041N_1-MZ
    -rw-rw-r-- 1 skoumal users 150095 Nov  7 17:40 02-16A001N_0-MZ

Anotátoři a přidělené soubory (davka-2)

Václav Horký - VH

  • (7862) – hotovo:
    -rw-rw-r-- 1 skoumal users 142827 May 27 17:40 06-12A011N_0-VH
    -rw-rw-r-- 1 skoumal users 166457 May 27 17:40 06-12P004N_3-VH
    -rw-rw-r-- 1 skoumal users 131514 May 27 17:40 06-13A003N_1-VH
    -rw-rw-r-- 1 skoumal users 128413 May 27 17:40 06-13A014N_2-VH
    -rw-rw-r-- 1 skoumal users 144623 May 27 17:40 06-13A028N_2-VH
    -rw-rw-r-- 1 skoumal users 150403 May 27 17:40 06-13A074N_1-VH
    -rw-rw-r-- 1 skoumal users 139190 May 27 17:40 06-13B005N_0-VH
    -rw-rw-r-- 1 skoumal users 189188 May 27 17:40 06-13B028N_1-VH
    -rw-rw-r-- 1 skoumal users 155333 May 27 17:40 06-13O007N_2-VH
    -rw-rw-r-- 1 skoumal users 189392 May 27 17:40 06-14A006N_0-VH
    -rw-rw-r-- 1 skoumal users 143619 May 27 17:40 06-14A008N_3-VH
    -rw-rw-r-- 1 skoumal users 184286 May 27 17:40 06-14E001N_0-VH
    -rw-rw-r-- 1 skoumal users 149792 May 27 17:40 06-14P007N_3-VH
    -rw-rw-r-- 1 skoumal users 147049 May 27 17:40 06-14T007N_0-VH
    -rw-rw-r-- 1 skoumal users 164936 May 27 17:40 06-14T020N_1-VH
    -rw-rw-r-- 1 skoumal users 170219 May 27 17:40 06-14X016N_1-VH
    -rw-rw-r-- 1 skoumal users 128950 May 27 17:40 06-15E017N_5-VH
    -rw-rw-r-- 1 skoumal users 166806 May 27 17:40 06-15O010N_2-VH
    -rw-rw-r-- 1 skoumal users 137835 May 27 17:40 06-16A001N_3-VH
    -rw-rw-r-- 1 skoumal users 131549 May 27 17:40 06-16P002N_3-VH
    -rw-rw-r-- 1 skoumal users 106379 May 27 17:40 06-16P007N_5-VH
    -rw-rw-r-- 1 skoumal users 141727 May 27 17:40 06-16X003N_1-VH
    -rw-rw-r-- 1 skoumal users 169034 May 27 17:40 06-16X030N_1-VH
  • (7028) – hotovo:
    -rw-r--r-- 1 skoumal users  91649 Jun 24 18:06 07-12A037N_4-VH
    -rw-r--r-- 1 skoumal users 161669 Jun 24 18:06 07-12O002N_0-VH
    -rw-r--r-- 1 skoumal users 144382 Jun 24 18:06 07-13A003N_2-VH
    -rw-r--r-- 1 skoumal users 148109 Jun 24 18:06 07-13A014N_0-VH
    -rw-r--r-- 1 skoumal users 106736 Jun 24 18:06 07-13A028N_4-VH
    -rw-r--r-- 1 skoumal users 142377 Jun 24 18:06 07-13A036N_5-VH
    -rw-r--r-- 1 skoumal users 149087 Jun 24 18:06 07-13A050N_0-VH
    -rw-r--r-- 1 skoumal users 166318 Jun 24 18:06 07-13E004N_6-VH
    -rw-r--r-- 1 skoumal users 154101 Jun 24 18:06 07-13O004N_0-VH
    -rw-r--r-- 1 skoumal users 144107 Jun 24 18:06 07-14T003N_4-VH
    -rw-r--r-- 1 skoumal users 121502 Jun 24 18:06 07-14T007N_2-VH
    -rw-r--r-- 1 skoumal users 149152 Jun 24 18:06 07-14T013N_3-VH
    -rw-r--r-- 1 skoumal users 152439 Jun 24 18:06 07-14X016N_4-VH
    -rw-r--r-- 1 skoumal users 131964 Jun 24 18:06 07-15C004N_0-VH
    -rw-r--r-- 1 skoumal users 134751 Jun 24 18:06 07-15O009N_0-VH
    -rw-r--r-- 1 skoumal users 129301 Jun 24 18:06 07-15P002N_0-VH
    -rw-r--r-- 1 skoumal users 180038 Jun 24 18:06 07-16A005N_0-VH
    -rw-r--r-- 1 skoumal users 133426 Jun 24 18:06 07-16A009N_3-VH
    -rw-r--r-- 1 skoumal users 125528 Jun 24 18:06 07-16P007N_2-VH
    -rw-r--r-- 1 skoumal users 128922 Jun 24 18:06 07-16X003N_3-VH
    -rw-r--r-- 1 skoumal users 140285 Jun 24 18:06 07-16X031N_0-VH
    -rw-r--r-- 1 skoumal users 129726 Jun 24 18:06 07-16X033N_3-VH

Anna Nováková - AN

  • (7634) – hotovo:
    -rw-r--r-- 1 skoumal users 136812 Jun 24 18:06 08-12A009N_0-AN
    -rw-r--r-- 1 skoumal users 162292 Jun 24 18:06 08-12A031N_0-AN
    -rw-r--r-- 1 skoumal users 174507 Jun 24 18:06 08-13A018N_0-AN
    -rw-r--r-- 1 skoumal users 150144 Jun 24 18:06 08-13A036N_0-AN
    -rw-r--r-- 1 skoumal users 147524 Jun 24 18:06 08-13A090N_4-AN
    -rw-r--r-- 1 skoumal users 159163 Jun 24 18:06 08-13B019N_1-AN
    -rw-r--r-- 1 skoumal users 121300 Jun 24 18:06 08-13B028N_0-AN
    -rw-r--r-- 1 skoumal users 117897 Jun 24 18:06 08-13O009N_0-AN
    -rw-r--r-- 1 skoumal users 122809 Jun 24 18:06 08-13P004N_0-AN
    -rw-r--r-- 1 skoumal users 159431 Jun 24 18:06 08-14C006N_0-AN
    -rw-r--r-- 1 skoumal users 153240 Jun 24 18:06 08-14T003N_1-AN
    -rw-r--r-- 1 skoumal users 132491 Jun 24 18:06 08-14T014N_4-AN
    -rw-r--r-- 1 skoumal users 157464 Jun 24 18:06 08-14X016N_5-AN
    -rw-r--r-- 1 skoumal users 164278 Jun 24 18:06 08-15E010N_5-AN
    -rw-r--r-- 1 skoumal users 132281 Jun 24 18:06 08-15O010N_1-AN
    -rw-r--r-- 1 skoumal users 162485 Jun 24 18:06 08-15X020N_2-AN
    -rw-r--r-- 1 skoumal users 177612 Jun 24 18:06 08-15X041N_3-AN
    -rw-r--r-- 1 skoumal users 181697 Jun 24 18:06 08-16A005N_3-AN
    -rw-r--r-- 1 skoumal users 139212 Jun 24 18:06 08-16E005N_4-AN
    -rw-r--r-- 1 skoumal users 121915 Jun 24 18:06 08-16E007N_4-AN
    -rw-r--r-- 1 skoumal users 137518 Jun 24 18:06 08-16X003N_4-AN
    -rw-r--r-- 1 skoumal users 176342 Jun 24 18:06 08-16X026N_1-AN
    -rw-r--r-- 1 skoumal users 141328 Jun 24 18:06 08-16X031N_2-AN

Michal Havrda - MH

  • (7117) – hotovo:
    -rw-r--r-- 1 skoumal users 160033 Jun 24 18:06 10-13A005N_4-MH
    -rw-r--r-- 1 skoumal users 137114 Jun 24 18:06 10-13A011N_3-MH
    -rw-r--r-- 1 skoumal users 112974 Jun 24 18:06 10-13A018N_2-MH
    -rw-r--r-- 1 skoumal users 157987 Jun 24 18:06 10-13A074N_5-MH
    -rw-r--r-- 1 skoumal users 129620 Jun 24 18:06 10-13B016N_1-MH
    -rw-r--r-- 1 skoumal users 152048 Jun 24 18:06 10-13O003N_0-MH
    -rw-r--r-- 1 skoumal users 126483 Jun 24 18:06 10-13P009N_0-MH
    -rw-r--r-- 1 skoumal users 157568 Jun 24 18:06 10-14A011N_1-MH
    -rw-r--r-- 1 skoumal users 154505 Jun 24 18:06 10-14C009N_2-MH
    -rw-r--r-- 1 skoumal users 181761 Jun 24 18:06 10-14O007N_0-MH
    -rw-r--r-- 1 skoumal users 135186 Jun 24 18:06 10-14P006N_1-MH
    -rw-r--r-- 1 skoumal users 143875 Jun 24 18:06 10-15O011N_0-MH
    -rw-r--r-- 1 skoumal users 122224 Jun 24 18:06 10-15O012N_0-MH
    -rw-r--r-- 1 skoumal users 166988 Jun 24 18:06 10-15P004N_0-MH
    -rw-r--r-- 1 skoumal users 122777 Jun 24 18:06 10-15T002N_2-MH
    -rw-r--r-- 1 skoumal users 202198 Jun 24 18:06 10-15T003N_1-MH
    -rw-r--r-- 1 skoumal users 190825 Jun 24 18:06 10-15T011N_4-MH
    -rw-r--r-- 1 skoumal users 181748 Jun 24 18:06 10-15X020N_0-MH
    -rw-r--r-- 1 skoumal users 161548 Jun 24 18:06 10-16A002N_0-MH
    -rw-r--r-- 1 skoumal users 114890 Jun 24 18:06 10-16E007N_5-MH
    -rw-r--r-- 1 skoumal users 128041 Jun 24 18:06 10-16P004N_3-MH
    -rw-r--r-- 1 skoumal users 131846 Jun 24 18:06 10-16X003N_0-MH

Pavel Kopřiva

  • (7328) – hotovo:
    -rw-r--r-- 1 skoumal users 123564 Jun 24 18:06 09-12A004N_1-PK
    -rw-r--r-- 1 skoumal users 140637 Jun 24 18:06 09-12A034N_3-PK
    -rw-r--r-- 1 skoumal users 137184 Jun 24 18:06 09-12H004N_1-PK
    -rw-r--r-- 1 skoumal users 118561 Jun 24 18:06 09-13A003N_0-PK
    -rw-r--r-- 1 skoumal users 151930 Jun 24 18:06 09-13A074N_4-PK
    -rw-r--r-- 1 skoumal users 137469 Jun 24 18:06 09-13A090N_2-PK
    -rw-r--r-- 1 skoumal users 115312 Jun 24 18:06 09-13B011N_0-PK
    -rw-r--r-- 1 skoumal users 161409 Jun 24 18:06 09-13B027N_0-PK
    -rw-r--r-- 1 skoumal users 156513 Jun 24 18:06 09-13O007N_3-PK
    -rw-r--r-- 1 skoumal users 143758 Jun 24 18:06 09-13P008N_1-PK
    -rw-r--r-- 1 skoumal users 147253 Jun 24 18:06 09-13T029N_3-PK
    -rw-r--r-- 1 skoumal users 218778 Jun 24 18:06 09-13X003N_0-PK
    -rw-r--r-- 1 skoumal users 121754 Jun 24 18:06 09-14A016N_0-PK
    -rw-r--r-- 1 skoumal users 132630 Jun 24 18:06 09-14C006N_3-PK
    -rw-r--r-- 1 skoumal users  98129 Jun 24 18:06 09-14T024N_4-PK
    -rw-r--r-- 1 skoumal users 171839 Jun 24 18:06 09-14X016N_2-PK
    -rw-r--r-- 1 skoumal users 143302 Jun 24 18:06 09-15O004N_2-PK
    -rw-r--r-- 1 skoumal users 170311 Jun 24 18:06 09-15X041N_0-PK
    -rw-r--r-- 1 skoumal users 148747 Jun 24 18:06 09-15X044N_1-PK
    -rw-r--r-- 1 skoumal users 169442 Jun 24 18:06 09-16A002N_1-PK
    -rw-r--r-- 1 skoumal users 104915 Jun 24 18:06 09-16E007N_1-PK
    -rw-r--r-- 1 skoumal users 145808 Jun 24 18:06 09-16X030N_0-PK

Kontrola a převzetí textů

  • Stejným způsobem jako při Anotaci
  • Pracujeme na jakobsonovi
  • Nejdříve texty zkontrolujeme:
    cd /net/grimm/store/corp/ortofon-etalon/csts-import
    for ff in 04-15X030N_3-PK 04-15X043N_2-PK 04-16A009N_2-PK; \
    do echo $ff; \
    /usr/local/annotate/bin/csts-export.pl --verbose $ff > /dev/null; done
  • Je-li vše v pořádku, soubory uložíme.

Převod zpět do vertikály, opravy

  • Vyrobíme adresář vert-export a převedeme soubory do něj:
    cd csts-export
    for ff in *; do echo $ff; oral-csts-vert.pl < $ff > ../vert-export/$ff; done

    zde jsou opraveny i entity, takže první sloupec by měl odpovídat originálu:

    cs ../chunks
    for ff in *; do echo $ff; sdiff -s <(cut -f1 $ff) <(cut -f1 ../vert-export/${ff%.vrt}-??); done
  • Opravíme invalid a X@

Problematické Horkého opravy -- dotazy na MK a DL

  • Tokenizace:
    • dvěstě —> dvě stě
    • v o —> vo
    • od tamaď —> odtamaď
    • třinácet —> třináct set
    • takovýty —> takový ty
    • tyjo —> ty jo
    • napohodu —> na pohodu
    • ježíš maria —> ježíšmarja
    • v spára —> V - spára (neměl by být přepis vé?)
    • ježíši maria —> ježíšimarja
    • devatenácet —> devatenáct set
    • osmnáctset —> osmnáct set

Příprava dat pro automatickou anotaci (hybrid vs. MorphoDiTa)

  • Data jsou na grimmovi v adresáři /store/corp/Ortofon.
  • Pracuje se s vertikálou ze souborů v /store/corp/Ortofon/ortofon-etalon/Verze/2/1.
  • Příprava společných dat z ortofon-etalon/davka-?:
    cd ortofon-hybrid/davka-1
    mkdir vert
    cd ../ortofon-etalon/davka-1/Verze/2/1
    for ff in *; do echo $ff; cut -f1 $ff | perl -pe 's/^<.*>$//' \
    | cat -s > ../../../../../ortofon-hybrid/davka-1/vert/$ff; done
    cd ../../../../../ortofon-hybrid/davka-1
    make-corp.sh -s vert -t csts -v -p45
    make-corp.sh -A1 -B1 -Eucs2 -M -p45 -s csts -t csts-morf -v

ortofon-hybrid

  • Projede se celým naším hybridem a na závěr se upraví podle potřeb ortofonu.
  • Příprava dat:
    cd .../ortofon-hybrid/davka-?
    rsync -avz ../../ortofon-manual/davka-?/csts-morf .
  • Honzův skript processing_hybrid.pl (na vertikálu):
    make-corp.sh -s csts-morf -t vert-morf -p45 -v
    cd /usr/local/corp/Perl/Ortofon
    ./processing_hybrid.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-morf
    cd -
    cp -pr vert-morf vert-morf.ori
    cd vert-morf
    for ff in *.out; do mv $ff ${ff%.out}; done
    cd -
    mv csts-morf csts-morf.ori
    make-corp.sh -s vert-morf -t csts-morf -p45 -v
  • Pravidla až do konce:
    make-whole-corp-csts.sh -C1 -Eucs2 -f -M -p45 -trules -v
  • A ještě nějaké menší opravy ( –> my apod.) Tomášovým skriptem EtalonizaceVertikaly.pl:
    make-corp.sh -s csts-rules-frazrl-rulh1-tag-vid-corr -t vert-rules-frazrl-rulh1-tag-vid-corr -p45 -v
    parallel-filter.sh -C "/usr/local/corp/Perl/EtalonizaceVertikaly.pl" \
    -s vert-rules-frazrl-rulh1-tag-vid-corr -t vert-hybrid -p45 -v
  • Honzův skript postprocessing16.pl (na vertikálu):
    cd /usr/local/corp/Perl/Ortofon
    ./postprocessing16.pl /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid
    cd /store/corp/Ortofon/ortofon-hybrid/davka-2/vert-hybrid
    mkdir ../vert-hybrid-out
    for ff in *.post; do mv $ff ../vert-hybrid-out/${ff%.post}; done
    cd ../vert-hybrid-out
    for ff in *; do echo $ff; sed '1{/^$/d}' $ff > ../../../ortofon-automat/davka-2/vert-hybrid/$ff; done

ortofon-morphodita

  • Pro MorphoDitu se připraví morfologie, která ale musí být v souladu s Etalonem a Davidovými skripty.
  • Příprava dat:
    cd .../ortofon-morphodita
    mkdir davka-?
    cd davka-?
    rsync -avz ../../ortofon-manual/davka-?/csts-morf .
  • Ovidování:
    make-corp.sh -s csts-morf -t csts-morf-vid -v -p45
  • Opravy vidu, roznásobení proměnných, zjednodušení tagů a odstranění duplicit:
    parallel-filter.sh \
    -C "corr-asp.pl | JH-wide-csts.sh | simplify-tags-csts-utf.pl | remove-dupl-csts-mark.pl X" \
    -p45 -s csts-morf-vid -t csts-morf-vid-corr -v
  • Tomášův skript:
    make-corp.sh -s csts-morf-vid-corr -t vert-morf-vid-corr -p45 -v
    parallel-filter.sh -C EtalonizaceVertikaly.pl -s vert-morf-vid-corr -t vert-morf-vid-corr-etln -p45 -v
  • Honzův skript processing_mdita.pl (na vertikálu). Vzniknou soubory .out:
    cd /usr/local/corp/Perl/Ortofon
    ./processing_mdita.pl /store/corp/Ortofon/ortofon-morphodita/davka-?/vert-morf-vid-corr-etln
  • Převedeme Honzův výstup na vstup pro MDiTu:
    cd /store/corp/Ortofon/ortofon-morphodita/davka-?
    mkdir vert-morphodita-in
    cd vert-morf-vid-corr-etln
    for ff in *.out; do echo $ff; sed '1{/^$/d}' $ff > ../vert-morphodita-in/${ff%.out}; done
    rm *.out
  • Spustíme MorphoDiTu a výsledek uložíme do /store/corp/Ortofon/ortofon-morphodita/vert-morphodita-out.
  • Honzův skript postprocessing16.pl (na vertikálu):
    parallel-filter.sh -C "cut -f1-3 | perl -pe 's/(\t.*)\t/\$1 /'" -s vert-morphodita-out \
    -t vert-morphodita-result -p45 -v
    cd /usr/local/corp/Perl/Ortofon
    ./postprocessing16.pl /store/corp/Ortofon/ortofon-morphodita/davka-2/vert-morphodita-result
    cd -
  • Umístíme do adresáře, kde se sleje MorphoDiTa s hybridem pro ruční anotaci:
    cd ../../ortofon-automat
    mkdir -p davka-2/vert-morphodita
    cd ../ortofon-morphodita/davka-2/vert-morphodita-result
    for ff in *.post; do mv $ff ../../../ortofon-automat/davka-2/vert-morphodita/${ff%.post}; done

Slití výsledků a příprava importu (ortofon-automat)

  • Vše je v adresářích ortofon-automat/davka-?.
  • V adresáři vert-hybrid jsou výsledky hybridu (viz výše).
  • V adresáři vert-morphodita jsou výsledky MorphoDiTy (viz výše).
  • Do adresáře vert-paste slijeme MorphoDiTu a hybrid:
    cd .../ortofon-automat/davka-?
    mkdir vert-paste
    cd vert-morphodita
    for ff in *; do paste $ff <(cut -f2- ../vert-hybrid/$ff) | perl -pe 's/^[\t\ ]+$//' > ../vert-paste/$ff; done
  • Převedeme do csts:
    make-corp.sh -s vert-paste -t csts-paste -p45 -v
  • Odstraníme duplicity:
    parallel-filter.sh -C remove-dupl-csts.pl -p45 -s csts-paste -t csts-import -v

Anotátoři a přidělené soubory (davka-1)

Pavel Kopřiva

  • předplaceno 9.800, tj. 14.000 slovíček; 1. dávka 14.197 slov, hotovo
    -rw-rw-r-- 1 skoumal users 34407 May 23 14:41 01-12A005N_1-PK
    -rw-rw-r-- 1 skoumal users 46102 May 23 14:41 01-13A014N_1-PK
    -rw-rw-r-- 1 skoumal users 40307 May 23 14:41 01-13A028N_1-PK
    -rw-rw-r-- 1 skoumal users 41092 May 23 14:41 01-13B031N_1-PK
    -rw-rw-r-- 1 skoumal users 41144 May 23 14:41 01-13H005N_1-PK
    -rw-rw-r-- 1 skoumal users 44019 May 23 14:41 01-13O009N_3-PK
    -rw-rw-r-- 1 skoumal users 46809 May 23 14:41 01-13P004N_1-PK
    -rw-rw-r-- 1 skoumal users 43321 May 23 14:41 01-13P009N_2-PK
    -rw-rw-r-- 1 skoumal users 39320 May 23 14:41 01-14T003N_6-PK
    -rw-rw-r-- 1 skoumal users 44702 May 23 14:41 01-14T007N_1-PK
    -rw-rw-r-- 1 skoumal users 41621 May 23 14:41 01-14T010N_0-PK
    -rw-rw-r-- 1 skoumal users 44150 May 23 14:41 01-14X016N_6-PK
    -rw-rw-r-- 1 skoumal users 41916 May 23 14:41 01-14X019N_0-PK
    -rw-rw-r-- 1 skoumal users 44807 May 23 14:41 01-15O001N_0-PK
    -rw-rw-r-- 1 skoumal users 41851 May 23 14:41 01-15O004N_0-PK
    -rw-rw-r-- 1 skoumal users 44557 May 23 14:41 01-15O007N_1-PK
    -rw-rw-r-- 1 skoumal users 45038 May 23 14:41 01-15P001N_1-PK
    -rw-rw-r-- 1 skoumal users 50573 May 23 14:41 01-15T005N_0-PK
    -rw-rw-r-- 1 skoumal users 42191 May 23 14:41 01-15X045N_2-PK
    -rw-rw-r-- 1 skoumal users 42876 May 23 14:41 01-16A005N_2-PK
    -rw-rw-r-- 1 skoumal users 41182 May 23 14:41 01-16P002N_2-PK
    -rw-rw-r-- 1 skoumal users 37522 May 23 14:41 01-16X033N_5-PK
    -rw-rw-r-- 1 skoumal users 37307 May 23 14:41 02-12A011N_1-PK
    -rw-rw-r-- 1 skoumal users 43296 May 23 14:41 02-13A011N_5-PK
    -rw-rw-r-- 1 skoumal users 43113 May 23 14:41 02-13A036N_3-PK
    -rw-rw-r-- 1 skoumal users 42816 May 23 14:41 02-13B031N_0-PK
    -rw-rw-r-- 1 skoumal users 39465 May 23 14:41 02-13O009N_1-PK
    -rw-rw-r-- 1 skoumal users 45071 May 23 14:41 02-13O010N_2-PK
    -rw-rw-r-- 1 skoumal users 45410 May 23 14:41 02-13O013N_1-PK
    -rw-rw-r-- 1 skoumal users 41771 May 23 14:41 02-13P004N_3-PK
    -rw-rw-r-- 1 skoumal users 36768 May 23 14:41 02-13T029N_6-PK
    -rw-rw-r-- 1 skoumal users 39927 May 23 14:41 02-14A016N_2-PK
    -rw-rw-r-- 1 skoumal users 42561 May 23 14:41 02-14E003N_0-PK
    -rw-rw-r-- 1 skoumal users 44630 May 23 14:41 02-14T003N_5-PK
    -rw-rw-r-- 1 skoumal users 41947 May 23 14:41 02-14T013N_2-PK
    -rw-rw-r-- 1 skoumal users 51442 May 23 14:41 02-14T020N_3-PK
    -rw-rw-r-- 1 skoumal users 49831 May 23 14:41 02-15O002N_0-PK
    -rw-rw-r-- 1 skoumal users 40571 May 23 14:41 02-15O004N_1-PK
    -rw-rw-r-- 1 skoumal users 43099 May 23 14:41 02-15O009N_1-PK
    -rw-rw-r-- 1 skoumal users 41565 May 23 14:41 02-15P001N_3-PK
    -rw-rw-r-- 1 skoumal users 41993 May 23 14:41 02-15X041N_1-PK
    -rw-rw-r-- 1 skoumal users 41233 May 23 14:41 02-16A001N_0-PK
    -rw-rw-r-- 1 skoumal users 42439 May 23 14:41 02-16A005N_5-PK
    -rw-rw-r-- 1 skoumal users 46223 May 23 14:41 02-16P002N_0-PK
    -rw-rw-r-- 1 skoumal users 38555 May 23 14:41 02-16X003N_5-PK
    -rw-rw-r-- 1 skoumal users 42220 May 23 14:41 03-12A035N_3-PK
    -rw-rw-r-- 1 skoumal users 44055 May 23 14:41 03-13A014N_4-PK
    -rw-rw-r-- 1 skoumal users 40827 May 23 14:41 03-13O009N_2-PK
    -rw-rw-r-- 1 skoumal users 40803 May 23 14:41 03-13P010N_1-PK
    -rw-rw-r-- 1 skoumal users 46769 May 23 14:41 03-14A011N_3-PK
    -rw-rw-r-- 1 skoumal users 41318 May 23 14:41 03-14A016N_4-PK
    -rw-rw-r-- 1 skoumal users 44503 May 23 14:41 03-14P007N_2-PK
    -rw-rw-r-- 1 skoumal users 47542 May 23 14:41 03-14T010N_2-PK
    -rw-rw-r-- 1 skoumal users 47136 May 23 14:41 03-14T013N_0-PK
    -rw-rw-r-- 1 skoumal users 47617 May 23 14:41 03-14T020N_0-PK
    -rw-rw-r-- 1 skoumal users 43448 May 23 14:41 03-14X019N_4-PK
    -rw-rw-r-- 1 skoumal users 45235 May 23 14:41 03-14X021N_2-PK
    -rw-rw-r-- 1 skoumal users 43916 May 23 14:41 03-15E003N_0-PK
    -rw-rw-r-- 1 skoumal users 43217 May 23 14:41 03-15E015N_1-PK
    -rw-rw-r-- 1 skoumal users 42814 May 23 14:41 03-15O002N_1-PK
    -rw-rw-r-- 1 skoumal users 47330 May 23 14:41 03-15O007N_0-PK
    -rw-rw-r-- 1 skoumal users 47496 May 23 14:41 03-15X020N_1-PK
    -rw-rw-r-- 1 skoumal users 44998 May 23 14:41 03-15X041N_2-PK
    -rw-rw-r-- 1 skoumal users 47227 May 23 14:41 03-16A005N_4-PK
    -rw-rw-r-- 1 skoumal users 43259 May 23 14:41 03-16P004N_0-PK
    -rw-rw-r-- 1 skoumal users 47252 May 23 14:41 03-16X001N_1-PK
    -rw-rw-r-- 1 skoumal users 40593 May 23 14:41 03-16X031N_4-PK
    -rw-rw-r-- 1 skoumal users 40675 May 23 14:41 04-12A025N_0-PK
    -rw-rw-r-- 1 skoumal users 46531 May 23 14:41 04-12P004N_4-PK
    -rw-rw-r-- 1 skoumal users 45177 May 23 14:41 04-13B009N_0-PK
    -rw-rw-r-- 1 skoumal users 43618 May 23 14:41 04-13B019N_0-PK
    -rw-rw-r-- 1 skoumal users 44753 May 23 14:41 04-13B025N_0-PK
    -rw-rw-r-- 1 skoumal users 43003 May 23 14:41 04-13O007N_1-PK
    -rw-rw-r-- 1 skoumal users 44914 May 23 14:41 04-13O010N_0-PK
    -rw-rw-r-- 1 skoumal users 43164 May 23 14:41 04-13O013N_0-PK
    -rw-rw-r-- 1 skoumal users 44184 May 23 14:41 04-13P004N_2-PK
    -rw-rw-r-- 1 skoumal users 42275 May 23 14:41 04-14P007N_0-PK
    -rw-rw-r-- 1 skoumal users 49243 May 23 14:41 04-14T003N_0-PK
    -rw-rw-r-- 1 skoumal users 40458 May 23 14:41 04-14T010N_1-PK
    -rw-rw-r-- 1 skoumal users 42344 May 23 14:41 04-14T013N_1-PK
    -rw-rw-r-- 1 skoumal users 40829 May 23 14:41 04-14X016N_3-PK
    -rw-rw-r-- 1 skoumal users 40664 May 23 14:41 04-15O010N_0-PK
    -rw-rw-r-- 1 skoumal users 43180 May 23 14:41 04-15P001N_2-PK
    -rw-rw-r-- 1 skoumal users 40232 May 23 14:41 04-15P006N_0-PK
    -rw-rw-r-- 1 skoumal users 48556 May 23 14:41 05-14T010N_3-PK
    -rw-rw-r-- 1 skoumal users 44430 May 23 14:41 05-14T019N_0-PK
    -rw-rw-r-- 1 skoumal users 39720 May 23 14:41 05-14X012N_2-PK
    -rw-rw-r-- 1 skoumal users 49071 May 23 14:41 05-14X019N_2-PK
    -rw-rw-r-- 1 skoumal users 49252 May 23 14:41 05-14X019N_3-PK
    -rw-rw-r-- 1 skoumal users 44280 May 23 14:41 05-15O001N_1-PK
    -rw-rw-r-- 1 skoumal users 48997 May 23 14:41 05-15P001N_0-PK
    -rw-rw-r-- 1 skoumal users 49363 May 23 14:41 05-15X009N_1-PK
    -rw-rw-r-- 1 skoumal users 48507 May 23 14:41 05-15X020N_3-PK
    -rw-rw-r-- 1 skoumal users 38867 May 23 14:41 05-15X043N_5-PK
    -rw-rw-r-- 1 skoumal users 43559 May 23 14:41 05-16A005N_1-PK
    -rw-rw-r-- 1 skoumal users 44935 May 23 14:41 05-16E007N_0-PK
    -rw-rw-r-- 1 skoumal users 41154 May 23 14:41 05-16P002N_1-PK
    -rw-rw-r-- 1 skoumal users 44495 May 23 14:41 05-16P007N_1-PK
    -rw-rw-r-- 1 skoumal users 46831 May 23 14:41 05-16X001N_2-PK

Michal Havrda

  • může celou dobu, 1. dávka 1.938 slov, hotovo, nevykázáno
    -rw-rw-r-- 1 skoumal users 42289 May 23 14:41 04-15X030N_3-MH
    -rw-rw-r-- 1 skoumal users 46559 May 23 14:41 04-15X043N_2-MH
    -rw-rw-r-- 1 skoumal users 45223 May 23 14:41 04-16A009N_2-MH
    -rw-rw-r-- 1 skoumal users 43922 May 23 14:41 04-16E007N_2-MH
    -rw-rw-r-- 1 skoumal users 48672 May 23 14:41 04-16P004N_1-MH
    -rw-rw-r-- 1 skoumal users 42605 May 23 14:41 04-16X003N_2-MH
    -rw-rw-r-- 1 skoumal users 41810 May 23 14:41 05-12P004N_2-MH
    -rw-rw-r-- 1 skoumal users 42430 May 23 14:41 05-13A011N_0-MH
    -rw-rw-r-- 1 skoumal users 46764 May 23 14:41 05-13A014N_3-MH
    -rw-rw-r-- 1 skoumal users 40185 May 23 14:41 05-13A023N_3-MH
    -rw-rw-r-- 1 skoumal users 41116 May 23 14:41 05-13B005N_1-MH
    -rw-rw-r-- 1 skoumal users 48129 May 23 14:41 05-13D015N_0-MH
    -rw-rw-r-- 1 skoumal users 44079 May 23 14:41 05-13O007N_0-MH
    -rw-rw-r-- 1 skoumal users 45164 May 23 14:41 05-14A011N_2-MH

Anna Nováková

  • může až v červenci

Šárka Kadavá

  • kdyby bylo nejhůř

Václav Horký

  • jako pomvěd

Anotátoři a přidělené soubory (davka-2)

Pavel Kopřiva

  • 2. dávka 15.313 slov
    -rw-rw-r-- 1 skoumal users 42720 May 30 17:32 06-12A011N_0-PK
    -rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-12P004N_3-PK
    -rw-rw-r-- 1 skoumal users 45138 May 30 17:32 06-13A003N_1-PK
    -rw-rw-r-- 1 skoumal users 44326 May 30 17:32 06-13A014N_2-PK
    -rw-rw-r-- 1 skoumal users 45282 May 30 17:32 06-13A028N_2-PK
    -rw-rw-r-- 1 skoumal users 46555 May 30 17:32 06-13A074N_1-PK
    -rw-rw-r-- 1 skoumal users 43331 May 30 17:32 06-13B005N_0-PK
    -rw-rw-r-- 1 skoumal users 48089 May 30 17:32 06-13B028N_1-PK
    -rw-rw-r-- 1 skoumal users 40901 May 30 17:32 06-13O007N_2-PK
    -rw-rw-r-- 1 skoumal users 48638 May 30 17:32 06-14A006N_0-PK
    -rw-rw-r-- 1 skoumal users 40347 May 30 17:32 06-14A008N_3-PK
    -rw-rw-r-- 1 skoumal users 47879 May 30 17:32 06-14E001N_0-PK
    -rw-rw-r-- 1 skoumal users 44009 May 30 17:32 06-14P007N_3-PK
    -rw-rw-r-- 1 skoumal users 44757 May 30 17:32 06-14T007N_0-PK
    -rw-rw-r-- 1 skoumal users 45276 May 30 17:32 06-14T020N_1-PK
    -rw-rw-r-- 1 skoumal users 46545 May 30 17:32 06-14X016N_1-PK
    -rw-rw-r-- 1 skoumal users 42302 May 30 17:32 06-15E017N_5-PK
    -rw-rw-r-- 1 skoumal users 51075 May 30 17:32 06-15O010N_2-PK
    -rw-rw-r-- 1 skoumal users 41040 May 30 17:32 06-16A001N_3-PK
    -rw-rw-r-- 1 skoumal users 43143 May 30 17:32 06-16P002N_3-PK
    -rw-rw-r-- 1 skoumal users 35558 May 30 17:32 06-16P007N_5-PK
    -rw-rw-r-- 1 skoumal users 46698 May 30 17:32 06-16X003N_1-PK
    -rw-rw-r-- 1 skoumal users 45687 May 30 17:32 06-16X030N_1-PK
    -rw-rw-r-- 1 skoumal users 39012 May 30 17:32 07-12A037N_4-PK
    -rw-rw-r-- 1 skoumal users 44139 May 30 17:32 07-12O002N_0-PK
    -rw-rw-r-- 1 skoumal users 46794 May 30 17:32 07-13A003N_2-PK
    -rw-rw-r-- 1 skoumal users 46770 May 30 17:32 07-13A014N_0-PK
    -rw-rw-r-- 1 skoumal users 39791 May 30 17:32 07-13A028N_4-PK
    -rw-rw-r-- 1 skoumal users 43325 May 30 17:32 07-13A036N_5-PK
    -rw-rw-r-- 1 skoumal users 48700 May 30 17:32 07-13A050N_0-PK
    -rw-rw-r-- 1 skoumal users 42543 May 30 17:32 07-13E004N_6-PK
    -rw-rw-r-- 1 skoumal users 43506 May 30 17:32 07-13O004N_0-PK
    -rw-rw-r-- 1 skoumal users 44126 May 30 17:32 07-14T003N_4-PK
    -rw-rw-r-- 1 skoumal users 41497 May 30 17:32 07-14T007N_2-PK
    -rw-rw-r-- 1 skoumal users 45303 May 30 17:32 07-14T013N_3-PK
    -rw-rw-r-- 1 skoumal users 46002 May 30 17:32 07-14X016N_4-PK
    -rw-rw-r-- 1 skoumal users 42059 May 30 17:32 07-15C004N_0-PK
    -rw-rw-r-- 1 skoumal users 46510 May 30 17:32 07-15O009N_0-PK
    -rw-rw-r-- 1 skoumal users 41922 May 30 17:32 07-15P002N_0-PK
    -rw-rw-r-- 1 skoumal users 47748 May 30 17:32 07-16A005N_0-PK
    -rw-rw-r-- 1 skoumal users 40438 May 30 17:32 07-16A009N_3-PK
    -rw-rw-r-- 1 skoumal users 40615 May 30 17:32 07-16P007N_2-PK
    -rw-rw-r-- 1 skoumal users 45292 May 30 17:32 07-16X003N_3-PK
    -rw-rw-r-- 1 skoumal users 41981 May 30 17:32 07-16X031N_0-PK
    -rw-rw-r-- 1 skoumal users 44339 May 30 17:32 07-16X033N_3-PK
    -rw-rw-r-- 1 skoumal users 40608 May 30 17:32 08-12A009N_0-PK
    -rw-rw-r-- 1 skoumal users 45404 May 30 17:32 08-12A031N_0-PK
    -rw-rw-r-- 1 skoumal users 50613 May 30 17:32 08-13A018N_0-PK
    -rw-rw-r-- 1 skoumal users 43737 May 30 17:32 08-13A036N_0-PK
    -rw-rw-r-- 1 skoumal users 41190 May 30 17:32 08-13A090N_4-PK
    -rw-rw-r-- 1 skoumal users 43500 May 30 17:32 08-13B019N_1-PK
    -rw-rw-r-- 1 skoumal users 42045 May 30 17:32 08-13B028N_0-PK
    -rw-rw-r-- 1 skoumal users 39830 May 30 17:32 08-13O009N_0-PK
    -rw-rw-r-- 1 skoumal users 45677 May 30 17:32 08-13P004N_0-PK
    -rw-rw-r-- 1 skoumal users 43908 May 30 17:32 08-14C006N_0-PK
    -rw-rw-r-- 1 skoumal users 46106 May 30 17:32 08-14T003N_1-PK
    -rw-rw-r-- 1 skoumal users 42032 May 30 17:32 08-14T014N_4-PK
    -rw-rw-r-- 1 skoumal users 48395 May 30 17:32 08-14X016N_5-PK
    -rw-rw-r-- 1 skoumal users 41217 May 30 17:32 08-15E010N_5-PK
    -rw-rw-r-- 1 skoumal users 42667 May 30 17:32 08-15O010N_1-PK
    -rw-rw-r-- 1 skoumal users 46423 May 30 17:32 08-15X020N_2-PK
    -rw-rw-r-- 1 skoumal users 49698 May 30 17:32 08-15X041N_3-PK
    -rw-rw-r-- 1 skoumal users 50216 May 30 17:32 08-16A005N_3-PK
    -rw-rw-r-- 1 skoumal users 47850 May 30 17:32 08-16E005N_4-PK
    -rw-rw-r-- 1 skoumal users 43570 May 30 17:32 08-16E007N_4-PK
    -rw-rw-r-- 1 skoumal users 45400 May 30 17:32 08-16X003N_4-PK
    -rw-rw-r-- 1 skoumal users 44771 May 30 17:32 08-16X026N_1-PK
    -rw-rw-r-- 1 skoumal users 44471 May 30 17:32 08-16X031N_2-PK
    -rw-rw-r-- 1 skoumal users 40769 May 30 17:32 09-12A004N_1-PK
    -rw-rw-r-- 1 skoumal users 44435 May 30 17:32 09-12A034N_3-PK
    -rw-rw-r-- 1 skoumal users 45093 May 30 17:32 09-12H004N_1-PK
    -rw-rw-r-- 1 skoumal users 40828 May 30 17:32 09-13A003N_0-PK
    -rw-rw-r-- 1 skoumal users 48448 May 30 17:32 09-13A074N_4-PK
    -rw-rw-r-- 1 skoumal users 42901 May 30 17:32 09-13A090N_2-PK
    -rw-rw-r-- 1 skoumal users 43918 May 30 17:32 09-13B011N_0-PK
    -rw-rw-r-- 1 skoumal users 46136 May 30 17:32 09-13B027N_0-PK
    -rw-rw-r-- 1 skoumal users 43410 May 30 17:32 09-13O007N_3-PK
    -rw-rw-r-- 1 skoumal users 44953 May 30 17:32 09-13P008N_1-PK
    -rw-rw-r-- 1 skoumal users 44550 May 30 17:32 09-13T029N_3-PK
    -rw-rw-r-- 1 skoumal users 50559 May 30 17:32 09-13X003N_0-PK
    -rw-rw-r-- 1 skoumal users 42449 May 30 17:32 09-14A016N_0-PK
    -rw-rw-r-- 1 skoumal users 39140 May 30 17:32 09-14C006N_3-PK
    -rw-rw-r-- 1 skoumal users 36023 May 30 17:32 09-14T024N_4-PK
    -rw-rw-r-- 1 skoumal users 47500 May 30 17:32 09-14X016N_2-PK
    -rw-rw-r-- 1 skoumal users 46232 May 30 17:32 09-15O004N_2-PK
    -rw-rw-r-- 1 skoumal users 49245 May 30 17:32 09-15X041N_0-PK
    -rw-rw-r-- 1 skoumal users 41205 May 30 17:32 09-15X044N_1-PK
    -rw-rw-r-- 1 skoumal users 45831 May 30 17:32 09-16A002N_1-PK
    -rw-rw-r-- 1 skoumal users 42869 May 30 17:32 09-16E007N_1-PK
    -rw-rw-r-- 1 skoumal users 43627 May 30 17:32 09-16X030N_0-PK
    -rw-rw-r-- 1 skoumal users 44994 May 30 17:32 10-13A005N_4-PK
    -rw-rw-r-- 1 skoumal users 41142 May 30 17:32 10-13A011N_3-PK
    -rw-rw-r-- 1 skoumal users 41662 May 30 17:32 10-13A018N_2-PK
    -rw-rw-r-- 1 skoumal users 47470 May 30 17:32 10-13A074N_5-PK
    -rw-rw-r-- 1 skoumal users 39450 May 30 17:32 10-13B016N_1-PK
    -rw-rw-r-- 1 skoumal users 44065 May 30 17:32 10-13O003N_0-PK
    -rw-rw-r-- 1 skoumal users 43813 May 30 17:32 10-13P009N_0-PK
    -rw-rw-r-- 1 skoumal users 44046 May 30 17:32 10-14A011N_1-PK
    -rw-rw-r-- 1 skoumal users 46636 May 30 17:32 10-14C009N_2-PK
    -rw-rw-r-- 1 skoumal users 48973 May 30 17:32 10-14O007N_0-PK
    -rw-rw-r-- 1 skoumal users 40730 May 30 17:32 10-14P006N_1-PK
    -rw-rw-r-- 1 skoumal users 49089 May 30 17:32 10-15O011N_0-PK
    -rw-rw-r-- 1 skoumal users 43590 May 30 17:32 10-15O012N_0-PK
    -rw-rw-r-- 1 skoumal users 41094 May 30 17:32 10-15P004N_0-PK
    -rw-rw-r-- 1 skoumal users 39004 May 30 17:32 10-15T002N_2-PK
    -rw-rw-r-- 1 skoumal users 45379 May 30 17:32 10-15T003N_1-PK
    -rw-rw-r-- 1 skoumal users 46787 May 30 17:32 10-15T011N_4-PK
    -rw-rw-r-- 1 skoumal users 49274 May 30 17:32 10-15X020N_0-PK
    -rw-rw-r-- 1 skoumal users 43482 May 30 17:32 10-16A002N_0-PK
    -rw-rw-r-- 1 skoumal users 43649 May 30 17:32 10-16E007N_5-PK
    -rw-rw-r-- 1 skoumal users 46847 May 30 17:32 10-16P004N_3-PK
    -rw-rw-r-- 1 skoumal users 44270 May 30 17:32 10-16X003N_0-PK

Slití ruční a automatické anotace

Příprava dat

  • Pod adresářem [/net/grimm]/store/corp/Ortofon vytvoříme podadresář ortofon-merge a v něm davka-?/csts-import a davka-?/csts-merge.
  • V každém adresáři csts-merge si připravíme soubory pro slití.
    • Z adresáře …/ortofon-automat/davka-?/csts-export zkopírujeme soubory a příponu převedeme na malá písmena. Při kopírování budeme rovnou vybírat unikátní tagy:
      parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \
      -s ../../ortofon-automat/davka-?/csts-export -t csts-merge -v
      cd csts-merge
      for ff in *-PK; do gg=${ff%-PK}-pk; echo "$ff $gg"; mv $ff $gg; done

      Tohle provedeme pro každou příponu.

    • Z adresáře ../../ortofon-manual/davka-?/csts-export zkopírujeme odpovídající ručně zpracované soubory:
      parallel-filter.sh -C /net/grimm/usr/local/corp/bin/unique-tag.pl -p6 \
      -s ../../ortofon-manual/davka-?/csts-export -t csts-merge -v
    • Soubory s velkými písmeny mají v mark-upu <chunk> a košaté <s>; obojí mark-up musí obsahovat stejné tagy:
      for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's/<p>/<chunk>/' $ff; done
      for ff in *.bak; do echo ${ff%.bak}; sdiff -s ${ff%.bak} $ff; done
      for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's:</c>:</chunk>\n</c>:' $ff; done
      for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 's:<s>:</s>\n<s>:' $ff; done
      for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:(<chunk>)\n</s>:$1:' $ff; done
      for ff in *-[a-z][a-z]; do echo $ff; perl -i.bak -pe 'undef $/; s:<s>\n(</chunk>):$1:' $ff; done
      for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:<p>\n<s>\n::' $ff; done
      for ff in *-[A-Z][A-Z]; do echo $ff; perl -i.bak -pe 'undef $/; s:<p>\n::' $ff; done
    • Musíme zkontrolovat, jestli jsou zarovnané:
      for ff in *-[A-Z][A-Z]; do echo $ff; paste $ff ${ff%-??}-[a-z][a-z] | grep "</s>"; done |\
      grep -vP "</s>\t</s>" | l

      a opravit podle originálních dat v …/Ortofon/ortofon-data/0?.

    • Automatické soubory nemají vid. Po zarovnání vytvoříme adresář csts-tag a zkopírujeme do něj soubory *-[A-Z][A-Z].
      mkdir ../csts-tag
      cp -p *-[A-Z][A-Z] ../csts-tag

      Na ně provedeme vidování a opravy vidů:

      [frozen]
      make-asp.sh -Eucs2 -fcsts -p6 -s csts-tag -t csts-tag-vid -v
      cd /usr/local/corp/frozen-states/201910/corp/DisambiguacniSkripty/PostDisambVid-utf-csts/povinne
      parallel-filter.sh -C "./11_OpravitVid-1 | ./11_OpravitVid-2 | ./20_asp_stat.pl" \
      -s /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid \
      -t /net/grimm/store/corp/Ortofon/ortofon-merge/davka-?/csts-tag-vid-corr -v
      cd -
      for ff in *; do echo $ff; perl -i -pe 's/invalid-/invalid/' $ff; done
    • Soubory z csts-tag-vid-corr zkopírujeme zpátky do csts-merge.
    • Sjednotíme mark-up:
      for ff in *; do echo $ff; perl -i.bak -pe 's/<f[^>]+>/<f>/' $ff; done
      for ff in *; do echo $ff; perl -i.bak -pe 's/(<MM[lt])[^>]+>/$1>/g' $ff; done
    • Zkontrolujeme a opravíme obouvidá slovesa:
      grep "B$" * | cut -f3 -d'<' | sort -u > ../vidy.txt

      U dalších dávek porovnáme s předchozími

      for ff in $(cat vidy.txt); do echo $ff; grep -h "<l>$ff<" ../davka-1/csts-import/* | sort -u; done | l

      A opravíme

      for ff in $(grep -l "<MMl>bydlet<MMt>...............B" *); do echo $ff; \
      perl -i.bak -pe\'s/(<MMl>bydlet<MMt>...............)B/$1I/' $ff; done
  • Připravíme data pro import:
    for ff in *-[A-Z][A-Z]; do gg=${ff%-??}-[a-z][a-z]; suff=$(echo $gg| cut -f3 -d'-'); \
    echo $ff-$suff; paste $ff ${ff%-??}-[a-z][a-z] | perl -pe 's/<MM/</g' | merge-csts \
    | perl -pe 's/<l>[^<]+<t>X@--------------(<l>[^<]+<t>[FM])/$1/' > ../csts-import/$ff-$suff; done

    a upravíme řádky s tagy F, H a M:

    for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/<d>@<l>@<t>Z:--------------/<f>@/' $ff; done
    for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/<l>@@<t>Z:--------------//' $ff; done
    for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/<l>emm<t>X@--------------//' $ff; done
    for ff in *-??-??; do echo $ff; perl -i.bak -pe 's/<l>hmm<t>II--------------//' $ff; done
  • Provedeme import (na jakobsonovi):
    cd ../csts-import
    for ff in *; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done
  • Upravíme /usr/local/annotate/users
  • Určování vidů:
    • dát P – dát se I
    • (dokázat P – dokázat (umět) B)
    • (dovést P – dovést (umět) B)
    • hodit P – hodit se I
    • jmenovat B – jmenovat se I
    • (napovídat P – napovídat I)
    • (orientovat B – orientovat se I)
    • stát I – stát se P
    • věnovat P – věnovat se I

Anotátoři a přidělené soubory (davka-1)

Jan Henyš

  • (6696):
    -rw-r--r-- 1 skoumal users  74626 Nov  4 14:57 01-12A005N_1-VH-pk
    -rw-r--r-- 1 skoumal users  81870 Nov  4 14:57 01-13A014N_1-VH-pk
    -rw-r--r-- 1 skoumal users  93895 Nov  4 14:57 01-13A028N_1-VH-pk
    -rw-r--r-- 1 skoumal users 127391 Nov  4 14:57 01-13B031N_1-VH-pk
    -rw-r--r-- 1 skoumal users 108674 Nov  4 14:57 01-13H005N_1-VH-pk
    -rw-r--r-- 1 skoumal users 139978 Nov  4 14:57 01-13O009N_3-VH-pk
    -rw-r--r-- 1 skoumal users  81235 Nov  4 14:57 01-13P004N_1-VH-pk
    -rw-r--r-- 1 skoumal users 106283 Nov  4 14:57 01-13P009N_2-VH-pk
    -rw-r--r-- 1 skoumal users 104777 Nov  4 14:57 01-14T003N_6-VH-pk
    -rw-r--r-- 1 skoumal users 114342 Nov  4 14:57 01-14T007N_1-VH-pk
    -rw-r--r-- 1 skoumal users 121376 Nov  4 14:57 01-14T010N_0-VH-pk
    -rw-r--r-- 1 skoumal users 122450 Nov  4 14:57 01-14X016N_6-VH-pk
    -rw-r--r-- 1 skoumal users 126982 Nov  4 14:57 01-14X019N_0-VH-pk
    -rw-r--r-- 1 skoumal users 130419 Nov  4 14:57 01-15O001N_0-VH-pk
    -rw-r--r-- 1 skoumal users 105776 Nov  4 14:57 01-15O004N_0-TM-pk
    -rw-r--r-- 1 skoumal users 111398 Nov  4 14:57 01-15O007N_1-TM-pk
    -rw-r--r-- 1 skoumal users 115392 Nov  4 14:57 01-15P001N_1-TM-pk
    -rw-r--r-- 1 skoumal users 154103 Nov  4 14:57 01-15T005N_0-TM-pk
    -rw-r--r-- 1 skoumal users 107637 Nov  4 14:57 01-15X045N_2-TM-pk
    -rw-r--r-- 1 skoumal users 125936 Nov  4 14:57 01-16A005N_2-TM-pk
    -rw-r--r-- 1 skoumal users  98450 Nov  4 14:57 01-16P002N_2-TM-pk
    -rw-r--r-- 1 skoumal users 104393 Nov  4 14:57 01-16X033N_5-TM-pk
    -rw-r--r-- 1 skoumal users 101802 Nov  4 14:57 02-12A011N_1-TM-pk
    -rw-r--r-- 1 skoumal users 122296 Nov  4 14:57 02-13A011N_5-TM-pk
    -rw-r--r-- 1 skoumal users 102029 Nov  4 14:57 02-13A036N_3-TM-pk
    -rw-r--r-- 1 skoumal users 149647 Nov  4 14:57 02-13B031N_0-TM-pk
    -rw-r--r-- 1 skoumal users  88170 Nov  4 14:57 02-13O009N_1-TM-pk
    -rw-r--r-- 1 skoumal users  97377 Nov  4 14:57 02-13O010N_2-TM-pk
    -rw-r--r-- 1 skoumal users 126451 Nov  4 14:57 02-13O013N_1-MZ-pk
    -rw-r--r-- 1 skoumal users  79652 Nov  4 14:57 02-13P004N_3-MZ-pk
    -rw-r--r-- 1 skoumal users  98536 Nov  4 14:57 02-13T029N_6-MZ-pk
    -rw-r--r-- 1 skoumal users  88034 Nov  4 14:57 02-14A016N_2-MZ-pk
    -rw-r--r-- 1 skoumal users 118994 Nov  4 14:57 02-14E003N_0-MZ-pk
    -rw-r--r-- 1 skoumal users 123841 Nov  4 14:57 02-14T003N_5-MZ-pk
    -rw-r--r-- 1 skoumal users 115630 Nov  4 14:57 02-14T013N_2-MZ-pk
    -rw-r--r-- 1 skoumal users 134818 Nov  4 14:57 02-14T020N_3-MZ-pk
    -rw-r--r-- 1 skoumal users 141686 Nov  4 14:57 02-15O002N_0-MZ-pk
    -rw-r--r-- 1 skoumal users  95958 Nov  4 14:57 02-15O004N_1-MZ-pk
    -rw-r--r-- 1 skoumal users  84948 Nov  4 14:57 02-15O009N_1-MZ-pk
    -rw-r--r-- 1 skoumal users 101862 Nov  4 14:57 02-15P001N_3-MZ-pk
    -rw-r--r-- 1 skoumal users 100946 Nov  4 14:57 02-15X041N_1-MZ-pk
    -rw-r--r-- 1 skoumal users 115018 Nov  4 14:57 02-16A001N_0-MZ-pk
    -rw-r--r-- 1 skoumal users 132001 Nov  4 14:57 02-16A005N_5-SK-pk
    -rw-r--r-- 1 skoumal users  99541 Nov  4 14:57 02-16P002N_0-SK-pk
    -rw-r--r-- 1 skoumal users  79580 Nov  4 14:57 02-16X003N_5-SK-pk
    -rw-r--r-- 1 skoumal users 109874 Nov  4 14:57 03-12A035N_3-SK-pk
    -rw-r--r-- 1 skoumal users  92330 Nov  4 14:57 03-13A014N_4-SK-pk
    -rw-r--r-- 1 skoumal users 101868 Nov  4 14:57 03-13O009N_2-SK-pk
    -rw-r--r-- 1 skoumal users 102143 Nov  4 14:57 03-13P010N_1-SK-pk
    -rw-r--r-- 1 skoumal users 119923 Nov  4 14:57 03-14A011N_3-SK-pk
    -rw-r--r-- 1 skoumal users  89549 Nov  4 14:57 03-14A016N_4-SK-pk
    -rw-r--r-- 1 skoumal users 102824 Nov  4 14:57 03-14P007N_2-SK-pk
    -rw-r--r-- 1 skoumal users 152089 Nov  4 14:57 03-14T010N_2-SK-pk
    -rw-r--r-- 1 skoumal users 127088 Nov  4 14:57 03-14T013N_0-SK-pk
    -rw-r--r-- 1 skoumal users 131223 Nov  4 14:57 03-14T020N_0-SK-pk
    -rw-r--r-- 1 skoumal users 125088 Nov  4 14:57 03-14X019N_4-SK-pk
    -rw-r--r-- 1 skoumal users 106008 Nov  4 14:57 03-14X021N_2-LK-pk

Václav Horký

  • (7020) hotovo:
    -rw-r--r-- 1 skoumal users 156567 Nov  4 14:57 03-15E003N_0-LK-pk
    -rw-r--r-- 1 skoumal users 112165 Nov  4 14:57 03-15E015N_1-LK-pk
    -rw-r--r-- 1 skoumal users 100938 Nov  4 14:57 03-15O002N_1-LK-pk
    -rw-r--r-- 1 skoumal users 122762 Nov  4 14:57 03-15O007N_0-LK-pk
    -rw-r--r-- 1 skoumal users 143197 Nov  4 14:57 03-15X020N_1-LK-pk
    -rw-r--r-- 1 skoumal users 115593 Nov  4 14:57 03-15X041N_2-LK-pk
    -rw-r--r-- 1 skoumal users 135237 Nov  4 14:57 03-16A005N_4-LK-pk
    -rw-r--r-- 1 skoumal users  97000 Nov  4 14:57 03-16P004N_0-LK-pk
    -rw-r--r-- 1 skoumal users 145176 Nov  4 14:57 03-16X001N_1-LK-pk
    -rw-r--r-- 1 skoumal users 118990 Nov  4 14:57 03-16X031N_4-LK-pk
    -rw-r--r-- 1 skoumal users  83711 Nov  4 14:57 04-12A025N_0-LK-pk
    -rw-r--r-- 1 skoumal users 140071 Nov  4 14:57 04-12P004N_4-LK-pk
    -rw-r--r-- 1 skoumal users 152345 Nov  4 14:57 04-13B009N_0-LK-pk
    -rw-r--r-- 1 skoumal users 126964 Nov  4 14:57 04-13B019N_0-MH-pk
    -rw-r--r-- 1 skoumal users 116116 Nov  4 14:57 04-13B025N_0-MH-pk
    -rw-r--r-- 1 skoumal users 102177 Nov  4 14:57 04-13O007N_1-MH-pk
    -rw-r--r-- 1 skoumal users 101580 Nov  4 14:57 04-13O010N_0-MH-pk
    -rw-r--r-- 1 skoumal users 139029 Nov  4 14:57 04-13O013N_0-MH-pk
    -rw-r--r-- 1 skoumal users  83185 Nov  4 14:57 04-13P004N_2-MH-pk
    -rw-r--r-- 1 skoumal users 101357 Nov  4 14:57 04-14P007N_0-MH-pk
    -rw-r--r-- 1 skoumal users 136504 Nov  4 14:57 04-14T003N_0-MH-pk
    -rw-r--r-- 1 skoumal users 125338 Nov  4 14:57 04-14T010N_1-MH-pk
    -rw-r--r-- 1 skoumal users 105675 Nov  4 14:57 04-14T013N_1-MH-pk
    -rw-r--r-- 1 skoumal users 106963 Nov  4 14:57 04-14X016N_3-MH-pk
    -rw-r--r-- 1 skoumal users  92255 Nov  4 14:57 04-15O010N_0-MH-pk
    -rw-r--r-- 1 skoumal users 111061 Nov  4 14:57 04-15P001N_2-MH-pk
    -rw-r--r-- 1 skoumal users  90023 Nov  4 14:57 04-15P006N_0-MH-pk
    -rw-r--r-- 1 skoumal users 102256 Nov  4 14:57 04-15X030N_3-PK-mh
    -rw-r--r-- 1 skoumal users  96979 Nov  4 14:57 04-15X043N_2-PK-mh
    -rw-r--r-- 1 skoumal users 125828 Nov  4 14:57 04-16A009N_2-PK-mh
    -rw-r--r-- 1 skoumal users  85615 Nov  4 14:57 04-16E007N_2-PK-mh
    -rw-r--r-- 1 skoumal users 113460 Nov  4 14:57 04-16P004N_1-PK-mh
    -rw-r--r-- 1 skoumal users 103271 Nov  4 14:57 04-16X003N_2-PK-mh
    -rw-r--r-- 1 skoumal users 114602 Nov  4 14:57 05-12P004N_2-PK-mh
    -rw-r--r-- 1 skoumal users 122736 Nov  4 14:57 05-13A011N_0-PK-mh
    -rw-r--r-- 1 skoumal users  96925 Nov  4 14:57 05-13A014N_3-PK-mh
    -rw-r--r-- 1 skoumal users  80363 Nov  4 14:57 05-13A023N_3-PK-mh
    -rw-r--r-- 1 skoumal users 108453 Nov  4 14:57 05-13B005N_1-PK-mh
    -rw-r--r-- 1 skoumal users 139140 Nov  4 14:57 05-13D015N_0-PK-mh
    -rw-r--r-- 1 skoumal users 104929 Nov  4 14:57 05-13O007N_0-PK-mh
    -rw-r--r-- 1 skoumal users 125205 Nov  4 14:57 05-14A011N_2-PK-mh
    -rw-r--r-- 1 skoumal users 147679 Nov  4 14:57 05-14T010N_3-AN-pk
    -rw-r--r-- 1 skoumal users  90556 Nov  4 14:57 05-14T019N_0-AN-pk
    -rw-r--r-- 1 skoumal users 117725 Nov  4 14:57 05-14X012N_2-AN-pk
    -rw-r--r-- 1 skoumal users 146139 Nov  4 14:57 05-14X019N_2-AN-pk
    -rw-r--r-- 1 skoumal users 148811 Nov  4 14:57 05-14X019N_3-AN-pk
    -rw-r--r-- 1 skoumal users 121542 Nov  4 14:57 05-15O001N_1-AN-pk
    -rw-r--r-- 1 skoumal users 117876 Nov  4 14:57 05-15P001N_0-AN-pk
    -rw-r--r-- 1 skoumal users 149222 Nov  4 14:57 05-15X009N_1-AN-pk
    -rw-r--r-- 1 skoumal users 141576 Nov  4 14:57 05-15X020N_3-AN-pk
    -rw-r--r-- 1 skoumal users  69482 Nov  4 14:57 05-15X043N_5-AN-pk
    -rw-r--r-- 1 skoumal users 149294 Nov  4 14:57 05-16A005N_1-AN-pk
    -rw-r--r-- 1 skoumal users  85369 Nov  4 14:57 05-16E007N_0-AN-pk
    -rw-r--r-- 1 skoumal users  95205 Nov  4 14:57 05-16P002N_1-AN-pk
    -rw-r--r-- 1 skoumal users  95081 Nov  4 14:57 05-16P007N_1-AN-pk
    -rw-r--r-- 1 skoumal users 133102 Nov  4 14:57 05-16X001N_2-MH-pk

Anotátoři a přidělené soubory (davka-2)

Jan Henyš

  • ():
    
    

Václav Horký

  • (4675) hotovo:
    -rw-r--r-- 1 skoumal staff 151492 Dec  5 15:12 08-16A005N_3-AN-pk
    -rw-r--r-- 1 skoumal staff 100204 Dec  5 15:12 08-16E005N_4-AN-pk
    -rw-r--r-- 1 skoumal staff  93187 Dec  5 15:12 08-16E007N_4-AN-pk
    -rw-r--r-- 1 skoumal staff 109275 Dec  5 15:12 08-16X003N_4-AN-pk
    -rw-r--r-- 1 skoumal staff 131994 Dec  5 15:12 08-16X026N_1-AN-pk
    -rw-r--r-- 1 skoumal staff 111495 Dec  5 15:12 08-16X031N_2-AN-pk
    -rw-r--r-- 1 skoumal staff  92992 Dec  5 15:12 09-12A004N_1-PK-pk
    -rw-r--r-- 1 skoumal staff 115075 Dec  5 15:12 09-12A034N_3-PK-pk
    -rw-r--r-- 1 skoumal staff 109999 Dec  5 15:12 09-12H004N_1-PK-pk
    -rw-r--r-- 1 skoumal staff  91469 Dec  5 15:12 09-13A003N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 112768 Dec  5 15:12 09-13A074N_4-PK-pk
    -rw-r--r-- 1 skoumal staff 112259 Dec  5 15:12 09-13A090N_2-PK-pk
    -rw-r--r-- 1 skoumal staff  87195 Dec  5 15:12 09-13B011N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 129299 Dec  5 15:12 09-13B027N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 122598 Dec  5 15:12 09-13O007N_3-PK-pk
    -rw-r--r-- 1 skoumal staff 110178 Dec  5 15:12 09-13P008N_1-PK-pk
    -rw-r--r-- 1 skoumal staff 120714 Dec  5 15:12 09-13T029N_3-PK-pk
    -rw-r--r-- 1 skoumal staff 185624 Dec  5 15:12 09-13X003N_0-PK-pk
    -rw-r--r-- 1 skoumal staff  90067 Dec  5 15:12 09-14A016N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 103615 Dec  5 15:12 09-14C006N_3-PK-pk
    -rw-r--r-- 1 skoumal staff  73062 Dec  5 15:12 09-14T024N_4-PK-pk
    -rw-r--r-- 1 skoumal staff 128943 Dec  5 15:12 09-14X016N_2-PK-pk
    -rw-r--r-- 1 skoumal staff 108095 Dec  5 15:12 09-15O004N_2-PK-pk
    -rw-r--r-- 1 skoumal staff 126265 Dec  5 15:12 09-15X041N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 119361 Dec  5 15:12 09-15X044N_1-PK-pk
    -rw-r--r-- 1 skoumal staff 142155 Dec  5 15:12 09-16A002N_1-PK-pk
    -rw-r--r-- 1 skoumal staff  75463 Dec  5 15:12 09-16E007N_1-PK-pk
    -rw-r--r-- 1 skoumal staff 120311 Dec  5 15:12 09-16X030N_0-PK-pk
    -rw-r--r-- 1 skoumal staff 135473 Dec  5 15:12 10-13A005N_4-MH-pk
    -rw-r--r-- 1 skoumal staff 113915 Dec  5 15:12 10-13A011N_3-MH-pk
    -rw-r--r-- 1 skoumal staff  87897 Dec  5 15:12 10-13A018N_2-MH-pk
    -rw-r--r-- 1 skoumal staff 121284 Dec  5 15:12 10-13A074N_5-MH-pk
    -rw-r--r-- 1 skoumal staff 103486 Dec  5 15:12 10-13B016N_1-MH-pk
    -rw-r--r-- 1 skoumal staff 127402 Dec  5 15:12 10-13O003N_0-MH-pk
    -rw-r--r-- 1 skoumal staff  99716 Dec  5 15:12 10-13P009N_0-MH-pk
    -rw-r--r-- 1 skoumal staff 126373 Dec  5 15:12 10-14A011N_1-MH-pk
    -rw-r--r-- 1 skoumal staff 132420 Dec  5 15:12 10-14C009N_2-MH-pk
    -rw-r--r-- 1 skoumal staff 154287 Dec  5 15:12 10-14O007N_0-MH-pk
    -rw-r--r-- 1 skoumal staff 108656 Dec  5 15:12 10-14P006N_1-MH-pk
    -rw-r--r-- 1 skoumal staff 111987 Dec  5 15:12 10-15O011N_0-MH-pk
    -rw-r--r-- 1 skoumal staff  98207 Dec  5 15:12 10-15O012N_0-MH-pk
    -rw-r--r-- 1 skoumal staff 139761 Dec  5 15:12 10-15P004N_0-MH-pk
    -rw-r--r-- 1 skoumal staff  94261 Dec  5 15:12 10-15T002N_2-MH-pk
    -rw-r--r-- 1 skoumal staff 179656 Dec  5 15:12 10-15T003N_1-MH-pk
    -rw-r--r-- 1 skoumal staff 158985 Dec  5 15:12 10-15T011N_4-MH-pk
    -rw-r--r-- 1 skoumal staff 147464 Dec  5 15:12 10-15X020N_0-MH-pk
    -rw-r--r-- 1 skoumal staff 133827 Dec  5 15:12 10-16A002N_0-MH-pk
    -rw-r--r-- 1 skoumal staff  87534 Dec  5 15:12 10-16E007N_5-MH-pk
    -rw-r--r-- 1 skoumal staff  96142 Dec  5 15:12 10-16P004N_3-MH-pk
    -rw-r--r-- 1 skoumal staff 103483 Dec  5 15:12 10-16X003N_0-MH-pk

Výroba vertikály s mark-upem

  • Ručně anotované soubory jsou v adresáři csts-export
  • Do vertikály je převedeme skriptem ortofon-csts-vert.pl:
    parallel-filter.sh -C "ortofon-csts-vert.pl" -p45 -s csts-export -t vert-export -v
  • Pro jistotu zkopírujeme vše do vert-opravy a opravy provádíme tam.

Kontrola a ruční opravy vertikály

Automatické opravy

  • Forma von.* vs lemma on.*:
    grep -P "von.*\ton" *
  • Varianta 6 u lemmat von.*:
     grep -P "von[^\t]*\tPP.*6" *
  • Vid u příklonky s (forma #s)
  • Sjednotit každý
  • Zkratky
  • Hesitační zvuky (hmm, @, @@)

Ruční opravy

  • invalid
  • X@
  • Vizuální kontrola tagů:
    grep -h -v "^<" * | cut -f3 | sort -u | l
  • Kontrola správnosti tagů:
    grep -h -v "^<" * | cut -f3 | sort -u | check-tag.pl -l16 > /dev/null
  • Kontrola hvězdiček apod.

Porovnání lemmat a POS od nás vs. MorphoDita (pro studentku Dominiku)

  • Vytvoříme adresář merge-csts, kde budeme připravovat texty pro anotaci.

Převod chunků do csts

  • Je třeba z vertikály udělat <csts> s tagy <s>:
    cd chunks
    for ff in *; do echo $ff; oral-vert-csts.pl < $ff > ../merge-csts/${ff%.vrt}.chunk.csts; done

Porovnání našich pravidel s chunky

  • Provedeme pomocí diffu:
    sdiff 05-16X001N_2.chunk.csts <(grep -v '<D>' ../csts-import/05-16X001N_2.vrt | perl -pe 's/(<MMt>.)[^<\n]+/$1/g' \
    | remove-dupl-csts-mark.pl | perl -pe 's/<f[^>]*>/<f>/') | l 

Převod csts-rules-frazrl do společného formátu

  • Převedeme takto:
    cd csts-import
    for ff in *.vrt; do echo $ff; grep -v '<D>' $ff | perl -pe 's/(<MMt>.)[^<\n]+/$1/g' \
    | perl -pe 's/&dhellip;/../g' | perl -pe 's/&thellip;/.../g' | perl -pe 's/(\*<MMt>)X/$1F/' \
    | perl -pe 's/\@+(<MMt>)Z/\@$1H/' | perl -pe 's/([eh]mm<MMt>)[IX]/$1H/' | perl -pe 's/<f[^>]+>([eh]mm<|\@+)/<d>$1/' \
    | perl -pe 's/(\)<MMt>)X/$1M/' | perl -pe 's/(\&amp;<MMt>)Z/$1H/' | remove-dupl-csts-mark.pl | perl -pe 's/<f[^>]*>/<f>/' \
    > ../merge-csts/${ff%.vrt}.import.csts; done

Slití chunk a import do merge-import

  • Vyrobíme data pro anotaci:
    mkdir -p merge-import
    cd merge-csts
    for ff in *.chunk.csts; do echo $ff; sdiff -w 2500 ${ff%.chunk.csts}.import.csts $ff \
    | perl -pe 's/[\ \t]+\|[\ \t]+<f>[^<]+//' | perl -pe 's/[\ \t]+<.*//' | remove-dupl-csts-mark.pl Q \
    > ../merge-import/${ff%.chunk.csts}.csts; done
  • zkontrolujeme tabulátory
  • a potom naimportujeme do anotačního programu (na jakobsonovi):
    cd ../merge-import
    for ff in *-Dom; do echo $ff; /usr/local/annotate/bin/csts-import-utkl.pl --force $ff; done

QR Code
QR Code wiki:user:skoumal:oral:ortofon-tagging (generated for current page)