| coolengineer/sejong-corpus |
103 |
|
0 |
0 |
almost 7 years ago |
0 |
|
1 |
other |
Shell |
| Korean sejong corpus download and simple analysis |
| amir-zeldes/gum |
76 |
|
0 |
0 |
over 2 years ago |
0 |
|
6 |
other |
Python |
| Repository for the Georgetown University Multilayer Corpus (GUM) |
| proycon/folia |
60 |
|
2 |
2 |
over 2 years ago |
93 |
October 08, 2021 |
21 |
gpl-3.0 |
Python |
| FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions |
| PLOS/allofplos |
53 |
|
0 |
0 |
over 2 years ago |
21 |
December 06, 2022 |
35 |
mit |
Python |
| Repository for the allofplos project. |
| kmike/opencorpora-tools |
42 |
|
6 |
0 |
over 5 years ago |
9 |
October 11, 2020 |
2 |
mit |
Python |
| Python interface to http://opencorpora.org/ |
| CopticScriptorium/corpora |
26 |
|
0 |
0 |
over 2 years ago |
0 |
|
10 |
|
CSS |
| Public repository for Coptic SCRIPTORIUM Corpora Releases |
| joaoventura/WikiCorpusExtractor |
19 |
|
0 |
0 |
over 11 years ago |
0 |
|
0 |
|
Python |
| Extracts text from WikiMedia XML Dump files |
| martinreynaert/TICCL |
19 |
|
0 |
0 |
almost 3 years ago |
0 |
|
2 |
gpl-3.0 |
Python |
| Text-Induced Corpus Clean-up |
| cligs/textbox |
18 |
|
0 |
0 |
about 4 years ago |
0 |
|
4 |
|
|
| Text collections made available by the CLiGS group. |
| mappingtreaties/tota |
16 |
|
0 |
0 |
over 8 years ago |
0 |
|
1 |
other |
|
| Texts of Trade Agreements Corpus |