Wiki spuštěna 24. 7. 2025

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:user:skoumal:tacr23 [2025/07/09 19:31] – created - external edit 127.0.0.1wiki:user:skoumal:tacr23 [2025/08/20 15:52] (current) – [Anotace Lemurem] skoumal
Line 24: Line 24:
 lemurtag -b -i vert-6 -o vert-mwe -m /cnk/local/ssd/vitovec/model_20240915.json -n100</code> lemurtag -b -i vert-6 -o vert-mwe -m /cnk/local/ssd/vitovec/model_20240915.json -n100</code>
  
 +=== Popis anotačního programu ===
 +
 +  * Program pro anotaci lemurem se jmenuje ''mwe_tagger'' a jeho poslední verze je na ''lovelace'' v adresáři ''/cnk/local/ssd/vitovec/mwe_suite/target/release''.
 +  * Ke své práci potřebuje zkompilovaný slovník z Lemuru, který se kompiluje programem ''lemur_compiler'':<code>
 +$ lemur_compiler -h
 +Compile Lemur data into representation suitable for data annotation
 +
 +Usage: lemur_compiler [OPTIONS] --model-file <MODEL_FILE>
 +
 +Options:
 +  -m, --model-file <MODEL_FILE>      Model file [e.g. model.msgpack]
 +  -u, --url <URL>                    Lemur URL [default: https://yggdrasil.korpus.cz/ratatosk/api/cunits/_search]
 +  -p, --name-prefix <NAME_PREFIX>    Name prefix: only dump MWEs whose name starts with the given prefix
 +  -i, --input-directory <INPUT_DIR>  Local directory containing Lemur data
 +  -t, --transform                    Whether to derive additional data from the original data
 +      --gen-json                     Whether to generate JSON output along with the MSGPACK output
 +  -v, --verbose...                   Increase verbosity level (-v, -vv)
 +  -h, --help                         Print help
 +  -V, --version                      Print version
 +</code>obvyklý postup kompilace je<code>cd ~/cnk-work/LEMUR
 +lemur_compiler -t -m model-250820.msgpack > model-250820.log</code>
 +  * Anotace se pak provádí programem ''mwe_tagger'':<code>
 +$ mwe_tagger -h
 +Annotate data with multi-word expressions from the Lemur database
 +
 +Usage: mwe_tagger [OPTIONS] --input <INPUT> --output <OUTPUT> --model-file <MODEL_FILE>
 +
 +Options:
 +  -i, --input <INPUT>
 +          Input file or directory
 +  -o, --output <OUTPUT>
 +          Output file or directory
 +  -m, --model-file <MODEL_FILE>
 +          Model file
 +  -n, --number-of-threads <THREADS_NUMBER>
 +          Number of threads (batch mode only) [default: 1]
 +  -f, --format <FORMAT>
 +          Format of input data [default: long] [possible values: long, short]
 +  -b, --batch-mode
 +          Batch mode (input and output interpreted as directories)
 +  -t, --include-tags
 +          Include MWE tags
 +  -c, --compress
 +          Avoid matches that are part of another matches (partially overlapping matches still allowed)
 +  -h, --help
 +          Print help
 +  -V, --version
 +          Print version
 +</code>obvyklý postup anotace je<code>
 +mwe_tagger -b -i vert-rules0-frazrl-rules-mdita-sublm-agr -o mwe_out -t -n 100 [-c]</code>výstupní adresář musí existovat.
 ==== Vytvoření korpusu pro porovnání ==== ==== Vytvoření korpusu pro porovnání ====
  

QR Code
QR Code wiki:user:skoumal:tacr23 (generated for current page)