South American Languages in Typological Databases
Computational linguistics, data science and language technologies
Computational linguistics, data science and language technologies
A State of the Art: South American languages are crucial for linguistic typology. Data coverage of these languages varies significantly across different typological databases. Some South American languages, language families, and typological features are underrepresented in typological research. Currently, Grambank provides data sheets for 253 South American languages. An average of 160.98 features were encoded per language, and for each feature in the Grambank database, there is information available for an average of 183.49 ± 38.35 South American languages. Unfortunately, similar estimates for other databases like WALS (https://wals.info/) and SAILS (https://sails.clld.org/) are not available. In this project, we aim to systematically and experimentally study the coverage of South American languages in the most important typological databases as a first step to determine what questions can be asked based on available data and what are the best approaches and methods to answer them. Funded by the Max Planck Institute for Evolutionary Anthropology. Research Areas: Computational Linguistics, Databases, and Language Technologies.