English-Punjabi Code-Mixed Social Media Content
View resource name in all available languages
Contenu de médias sociaux en anglais-panjabi
ID:
ELRA-W0319
The English-Punjabi Code-Mixed Social Media Content corpus is composed is composed of 893,615 parallel sentences of English-Punjabi distributed over the following domains:
- 82,341 parallel sentences of English-Punjabi code-mixed Agriculture Domain Data,
- 59,158 parallel sentences of English-Punjabi code-mixed Culture Domain Data,
- 101,732 parallel sentences of English-Punjabi code-mixed Entertainment Domain Data,
- 53,622 parallel sentences of English-Punjabi code-mixed Health Domain Data,
- 193,844 parallel sentences of English-Punjabi code-mixed Religion Domain Data,
- 106,256 parallel sentences of English-Punjabi code-mixed Sports Domain Data,
- 37,713 parallel sentences of English-Punjabi code-mixed Technology Domain Data,
- 77,183 parallel sentences of English-Punjabi code-mixed Tourism Domain Data,
- 63,103 parallel sentences of English-Punjabi code-mixed Education Domain Data,
- 119,663 parallel sentences of English-Punjabi code-mixed Entertainment Domain Data.
View resource description in all available languages