© Nakladatelství
KAROLINUM 2023

RSS RSS   facebook


visa visa
maestro maestro

webmaster

VŠECHNY ZDE NABÍZENÉ PUBLIKACE MÁME SKLADEM

košík

VÁŠ NÁKUP


0 POLOŽEK
CENA: 0 VČETNĚ DPH



Domácí stránka  > JAZYKOVĚDA  > detail titulu

DETAIL TITULU:

The World of Tokens, Tags and Trees

ÚFAL MFF UK 2018

brožovaná157 str.
ISBN 9788088132097

obálka
222,-
200,-
1-2 ks

Tato monografie představuje srovnávací studii přístupů k anotaci morfologie a syntaxe přirozených jazyků s důrazem na použitelnost v mnohojazyčném prostředí. Anotací se rozumí přidání lingvistických kategorií a vztahů do digitalizovaného textu v přirozeném jazyce. Výsledkem je anotovaný korpus; vzhledem k tomu, že syntaktické vztahy jsou často reprezentované jako závislostní stromy, tato monografie se soustředí na závislostní korpusy (treebanky).
This monograph presents a comparative study of annotation approaches to morphology and syntax of natural languages, with emphasis on applicability in a multilingual environment. Annotation is understood as adding linguistic categories and relations to digitally encoded natural language text, resulting in annotated corpus; as syntactic relations are often represented in the form of dependency trees, the annotated corpora covered by the monograph are dependency treebanks. Many treebanks exist and their annotation styles vary significantly, which hampers their usefulness for linguists and language engineers. We survey several harmonization efforts that tried to come up with cross-linguistically applicable annotation guidelines, including the most recent and broadest effort to date, Universal Dependencies. We examine language description on three levels: 1. tokenization and word segmentation, 2. morphology, and 3. surface dependency syntax. For each language phenomenon we provide a comparison of its analysis and annotation in various existing treebanks (or other corpora, for tokenization and morphology), pointing out advantages and disadvantages of the competing approaches. On the morphological layer, we go even beyond the currently available corpora and provide a typological survey of features that will be needed when less-resourced languages are covered by an annotation project. We conclude that no single approach is suitable for all purposes, but a good approach must not lose information, so that annotation can be converted to another style when necessary.