Building a Morphologically Comparable and Multiparallel Corpus for Underrepresented Peruvian Amazonian Languages

Lines of research:

Computational linguistics, data science and language technologies

Description

The Chana-PUCP Research Group is promoting the development of language technologies for Peruvian indigenous languages as a strategy to support their revitalization. Chana members are creating corpora for natural language processing research, such as spell checkers for educational purposes. Collecting computational data for indigenous Peruvian languages is challenging because digital resources for these languages are scarce. Therefore, it is essential to develop and annotate curated corpora as much as possible. An additional challenge is the significant difference between these native languages and Spanish. Peruvian indigenous languages are highly agglutinative and polysynthetic. We propose to build a multiparallel corpus across Amazonian Peruvian languages for morphological and syntactic comparison. We have initiated the project with the following Amazonian languages: Shipibo-Konibo, Kakataibo, Shiwilu, Amawaka, Ashaninka, Yanesha, Yine, and Matses. Funded by the Max Planck Institute for Evolutionary Anthropology. Research Areas: Computational Linguistics, Databases, and Language Technologies.

No items found.