Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| wiki:user:skoumal:tacr23 [2025/07/09 19:31] – created - external edit 127.0.0.1 | wiki:user:skoumal:tacr23 [2025/09/12 13:24] (current) – [Anotace Lemurem] skoumal | ||
|---|---|---|---|
| Line 24: | Line 24: | ||
| lemurtag -b -i vert-6 -o vert-mwe -m / | lemurtag -b -i vert-6 -o vert-mwe -m / | ||
| + | === Popis anotačního programu === | ||
| + | |||
| + | * Program pro anotaci lemurem se jmenuje '' | ||
| + | * Ke své práci potřebuje zkompilovaný slovník z Lemuru, který se kompiluje programem '' | ||
| + | $ lemur_compiler -h | ||
| + | Compile Lemur data into representation suitable for data annotation | ||
| + | |||
| + | Usage: lemur_compiler [OPTIONS] --model-file < | ||
| + | |||
| + | Options: | ||
| + | -m, --model-file < | ||
| + | -u, --url < | ||
| + | -p, --name-prefix < | ||
| + | -i, --input-directory < | ||
| + | -t, --transform | ||
| + | --gen-json | ||
| + | -v, --verbose... | ||
| + | -h, --help | ||
| + | -V, --version | ||
| + | </ | ||
| + | lemur_compiler -t -m model-250820.msgpack > model-250820.log</ | ||
| + | * Anotace se pak provádí programem '' | ||
| + | $ mwe_tagger -h | ||
| + | Annotate data with multi-word expressions from the Lemur database | ||
| + | |||
| + | Usage: mwe_tagger [OPTIONS] --input < | ||
| + | |||
| + | Options: | ||
| + | -i, --input < | ||
| + | Input file or directory | ||
| + | -o, --output < | ||
| + | Output file or directory | ||
| + | -m, --model-file < | ||
| + | Model file | ||
| + | -n, --number-of-threads < | ||
| + | Number of threads (batch mode only) [default: 1] | ||
| + | -f, --format < | ||
| + | Format of input data [default: long] [possible values: long, short] | ||
| + | -b, --batch-mode | ||
| + | Batch mode (input and output interpreted as directories) | ||
| + | -t, --include-tags | ||
| + | Include MWE tags | ||
| + | -c, --compress | ||
| + | Avoid matches that are part of another matches (partially overlapping matches still allowed) | ||
| + | -h, --help | ||
| + | Print help | ||
| + | -V, --version | ||
| + | Print version | ||
| + | </ | ||
| + | mwe_tagger -b -i vert-rules0-frazrl-rules-mdita-sublm-agr -o mwe_out -t -n 100 [-c]</ | ||
| + | * Pak ještě uložíme samotnou anotaci MWE:< | ||
| + | for ff in *; do cut -f7- $ff > ../ | ||
| ==== Vytvoření korpusu pro porovnání ==== | ==== Vytvoření korpusu pro porovnání ==== | ||
| Line 181: | Line 233: | ||
| vert2verttab.pl mwe-out | perl -pe 'undef $/; s/< | vert2verttab.pl mwe-out | perl -pe 'undef $/; s/< | ||
| </ | </ | ||
| + | |||
| + | ===== syn2020lemur - korpus s prokliky ===== | ||
| + | |||
| + | * Původní korpus od Pavla je v adresáři ''/ | ||
| + | * My použijeme '' | ||
| + | * Pracujeme v adresáři ''/ | ||
| + | * Vyrobíme adresář '' | ||
| + | * Vyrobime adresář '' | ||
| + | parallel-filter.sh -C 'cut -f1 | perl -C -pe "undef $/; s/ | ||
| + | | perl -C -pe "s/ </ | ||
| + | | perl -C -pe "undef $/; s: | ||
| + | | perl -C -pe "undef $/; s: | ||
| + | * V adresáři '' | ||