N-grams from Hungarian National Corpus

HNCNgrams

130

ID:

NGRAM-HNC

The national corpus of Hungarian language which is derived into five subcorpora by regional language variants, and into five subcorpora by text genres also. The subcorpus to be studied can be chosen by any combination of these. That makes the HNC an appropriate tool to study the differences not just between text genres but between language variants. HGC wishes to be a representative general-aim corpus of present-day standard Hungarian.
HGC is based on the Hungarian National Corpus with higher quality and finer level of analysis and annotation (detailed morphosyntactic analysis and disambiguation with updated processing toolchain, NP chunking, Named Entity recognition, distributional analysis, built in post-processing (multilevel frequency lists, subsequent searches on previous results)). HGC is extended up to 1 gigaword treshold with extended metadata and cleared IPR.

You don’t have the permission to edit this resource.