Monolingual text corpus Languages
Polish
Linguality Linguality type: Monolingual
Size Annotation Segmentation Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 04/01/2011
End date: 04/30/2011
Lemmatization StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 04/01/2011
End date: 04/30/2011
Segmentation StandOff: True
Segmentation level: Sentence
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 04/01/2011
End date: 04/30/2011
Semantic Annotation - Word Senses StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Manual (manually disambiguated using AnotEk)
Start date: 04/01/2011
End date: 11/19/2011
Morphosyntactic Annotation - B Pos Tagging Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 04/01/2011
End date: 04/30/2011
Morphosyntactic Annotation - Pos Tagging Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 04/01/2011
End date: 04/30/2011
Segmentation StandOff: True
Segmentation level: Paragraph
Format: text/xml
Standard practices conformance: TEI
Start date: 04/01/2011
End date: 04/30/2011
Creation Creation mode: Mixed
Creation mode details: Economy-related categories from the Polish Wikipedia, including economy-related subcategories, stripped Wikipedia annotations, tagged with TaKIPI 1.8 and converted to TEI format.
Original Sources Creation Tools Java code AnotEk 1.0 TaKIPI 1.8