The world's first pidgin-to-English machine translation model has been developed by the duo of Nigerian AI engineers, Kelechi Ogueji and Orevaoghene Ahia, who are both colleagues at InstaDeep, a Lagos-Nigeria based AI research firm.
As perhaps the only successfully developed model for the translation of pidgin to English, the project was published and accepted at the NeurIPS conference 2019, the world’s largest gathering of artificial intelligence researchers and enthusiasts. The field of machine translation (MT) which is quite different from machine-aided human translation (MAHT) is part of computational linguistics that are involved in the translation of one language text or speech to another.
While for basic level, MT offers simple substitution of words in a language for words in another, which usually cannot produce a good translation of contexts because the recognition of phrases and the closest counterparts in the particular language is missing. Therefore, the solution lies with neural techniques, which is a rapidly growing field that has led to improved translations, capable of handling differences in linguistic typology, idioms, and isolation of anomalies.
For instance, in order to build an English-to-French translator, you need a tons of sentence pairs for both of the languages. But with little pidgin English used online in writing, finding a decent translation data for the project proved to be major hindrance.
How did the Duo Brave the odds?
The supposed major hindrance in finding pidgin data lead them to train an Unsupervised Neural Machine Translation (UNMT) model, which essentially means the creating of a pidgin-English catalog of word-pairs from scratch, scraping about 56,695 pidgin sentences and 32,925 unique words from different websites.
Spurred by Ogueji's interest in working on language translations since encountering Google Translate in 2014. Albeit, this project is the first Natural Language Processing (NLP) project in the world done on the pidgin-English by any developer or group. Despite the fact that AI development is still at a rather embryonic stage in Nigeria, with scarce research data, too raw to make any sense where available.
But, beyond the seemingly insurmountably hindrances, they were able to find solace in frameworks like Word2vec and Google’s transformer, which gave their ambition a solid foundation.
What's the application of Pidgin-to-English Translation Model?
Given that natural language processing (NLP) is becoming a bit more practical with modern technologies, such as machine learning deployed in voice-to-text platforms like Google Assistant, Siri and Alexa. Other fields requiring communication can be improved upon using natural language processing, for instance, branding agencies can apply NLP in order to know exactly what people feel about a campaign.
And the best tools used in digital marketing are incorporating advancing NLP techniques and the age-old linguistic concepts. As most of the analysis of digital audiences is based on high-resource language models, the nuance could be lost when analyzing such online chatter. Like Nigerian Twitter contains a lot of ‘Nigerian Pidgin English’ with its very distinct lexicology, so the application in digital marketing is enormous.
But owing to visa delays, the duos of Oguejii and Ahia were not able to present their work at the NeurIPS 2019 workshops, which would have provided an opportunity for them to network with other similar researchers from the world's top universities and firms.
The disappointments, however did not kill their drive, as they have already made the codes available on Github for anyone to contribute in building the work on NLP pidgin-English.