How AI and Technology Are Reviving Indigenous Languages Around the World
Admin November 06, 2024
-672ba485cfd75.jpeg)
Imagine putting on a virtual reality
headset and entering a world where you can explore communities, like Missoula,
except your character, and everyone you interact with, speaks Salish, Cheyenne
or Blackfoot.
Alexa, can you speak Cheyenne?
Running Wolf worked
for Amazon’s Alexa project for four years and became deeply familiar with the
product.
As he worked on the
software, he wondered, “Could something like Alexa speak Cheyenne?”
But time and again,
Running Wolf said the initiative would die, because “it takes a lot.”
Google Assistant,
for example, employs tens of thousands of contractors and the technology
requires lots of data, including millions of hours of annotated audio spoken in
different languages.
“It’s a huge,
monumental effort,” Running Wolf said. “And if you’re looking at Montana
tribes, like the Northern Cheyenne, no one has that (data). We’re looking at
millions of hours of audio, and at best, we have maybe around 100 or so, and
some tribes don’t have anything.”
Nigeria As a Language Paradise
Nigeria currently has one of the
largest concentrations of language diversity in the world, with over 500
languages spoken in the country. Africa, in general, is home to around 1,000 to
2,000 languages, or about one-third of the world’s languages. The diversity of
languages in Nigeria reflects the diversity of the country's cultures, art,
histories, and traditions. However, as is the case in many other countries, a
lot of these languages are at risk of extinction.
Like many indigenous languages,
most languages in Nigeria have no written form, and as the number of speakers
decreases, it is more likely for the language to cease to exist. The issues
affecting indigenous languages all over the world– globalization, migration,
and the dominance of major convergent languages– are also increasing the rate
of language death in Nigeria. Between the 1940s and today, ten Nigerian
languages have gone extinct, and about one hundred more are either in trouble
or dying. According to Dr. Kola Adekola from the University of Ibadan’s Department of Anthropology and Archeology,
this is because of a mix between globalization and a general inclination by the
younger, more technology-oriented generation toward more dominant cultures.
“It isn’t just languages, entire
cultures are being lost because they are considered archaic by new generations
who communicate entirely in English. To solve this problem, we would need to
use a mix of local and technological solutions,” he says.
AI in Language Preservation
As language technologies
advance, there has been a lot of success in using artificial intelligence to
preserve endangered indigenous languages worldwide. In 2018, a Māori
people-owned non-profit radio station, Te Hiku Media, built language tech,
including automatic speech recognition (ASR) and speech-to-text, in an effort
to prevent their language from shrinking further, becoming the first to build
ASR tools for an indigenous language. Since then, attempts have been made to
preserve other endangered languages with AI. AI Pirinka is being used to
preserve the unique language isolation of the Ainu people, the indigenous
inhabitants of Hokkaido in northeastern Japan. Woolaroo, a project by Google,
is also using machine learning to teach and preserve languages like Yiddish and
Louisiana Creole.
This doesn’t come without its
own challenges, however. Many indigenous languages are under-resourced and not
NLP-supported, especially since most NLP work is Indo-Eurocentric in terms of
preprocessing, training, and evaluation algorithms. African languages, in
particular, are at risk of being left behind because of a lack of resources.
This includes datasets that can be used for training ML models. Many datasets
involving Nigerian languages are either incorrect or mislabeled, which will, in
turn, result in inaccurate models.
Ethics of AI in Language Preservation
Perhaps the biggest obstacle in
language preservation for endangered languages is the potential for
exploitation of indigenous people. Many endangered languages are at risk of
extinction due to cultural replacement and expansionism, so the people who
speak them are understandably wary of outside interventions. In the case of
the
Te Hiku, it was important that
the only people who profit from their language are Māori people themselves. For
them, protecting their data means protecting thousands of years of traditional
knowledge. For Dr. Adekola, the rewards outweigh the risks.
“There is a crisis, and the
truth is, if nothing is done, we will lose so much history and knowledge. If AI
is a way to prevent that, we need to embrace it while making sure that our
cultures are being respected,” said Adekola.
Language research, especially
with endangered languages, can be exploitative if ethical standards are not
firmly established and upheld. When working to preserve languages, it is
imperative that the agency of the people who speak them is respected and
extractive practices are discouraged. This means adopting a more conscious
approach to language preservation and working hand-in-hand with collaborators
from the community.
There are also some reservations
about the capacity of AI to understand the depth of indigenous languages fully.
This is part of a larger conversation on the ability of NLP actually to
comprehend language as used by humans. Many indigenous languages specifically
rely on tone, tone marking, vowel harmony, and context, which are missing in
most dominant languages. This is
especially difficult since most of these languages are purely oral without any
written form, making it challenging to preserve them without sacrificing the
non-written context many of them have. Some communities, like the Shoshone
community in the U.S. Southwest, are rejecting efforts to standardize their
language in written form.
For African languages, there has
been an increase in resources created and curated by people who speak the
language. Masakhane, which means “we build together “ in isiZulu, is a
grassroots organization whose mission is strengthening NLP research in Africa.
By offering tools to train baseline models for a wide range of African
languages, they have helped build models for more than 35 African languages.
Other organizations like Deep Learning Indaba and Black in AI are attempting to
build a sustainable community of AI experts both in Africa and the diaspora.
The African Language Dataset Challenge
was created to incentivize the
creation of datasets for African languages to address the issue of datasets.
That way, African people's rich cultures, and languages are represented and
protected.