GLiCom Spanish Wordform list – Regular word-forms + verb-clitic combinations
View resource name in all available languages
Liste de formes de mots GLiCom en espagnol – Formes de mots réguliers + combinaisons verbes-clitiques
ID:
ELRA-L0095_01
GLiCom Spanish Wordform List v.1 is a computational lexicon of inflected wordforms in Spanish. Each entry has the following information: (i) lemma, (ii) morphosyntactic tag, and (iii) word type. This lexicon can be used in any application for Text Analysis in Spanish, in particular those in need for a lemmatizer, POS tagger, or Named Entity recogniser.
The lexicon is distributed in two sublexicons:
1- word forms
2- verb-clitic combinations
The list of wordforms contains 1,152,242 entries, including (i) regular words (1,144,086), (ii) toponyms and anthroponyms (8,032), (iii) abbreviations and acronyms (775), and (iv) computational terms (124). Each entry consists of: form, lemma, morphosyntactic tag and the word type.
The list of verb-clitic combinations contains 4,283,637 entries, exhaustively covering all formal combinations (including infinitive, gerund and imperative). Note that some clitic combinations may be formally possible although semantically implausible. Each entry consists of: form, lemma of the verb and combination of morphosyntactic tags of the verb and the pronoun(s).
View resource description in all available languages