How Azeez, Unilag’s student built a Nigerian accent AI text-to-speech model

Saheed Azeez, a University of Lagos student made a name for himself and the entire country after creating two million GPT tokens, he has built an artificial intelligence (AI) text-to-speech model with Nigerian accent.

According to Techpoint report, Azeez had earlier in 2024 create Naijaweb, a dataset of 230 million GPT-2 tokens based on Nairaland. However, in his new passion project he has pushed his creative skill further with YarnGPT, a text-to-speech AI model that can read text aloud in a Nigerian accent.

In a world where AI can generate lifelike voices in seconds, a text-to-speech model with a Nigerian accent might not seem revolutionary at first.

However, considering that Azeez is a university student with limited resources, and that developing a model that accurately captures the distinctions of a Nigerian accent is technically challenging, then it is a remarkable feat.

Azeez, speaking about his revolutionary invention after the success of Naijaweb, said; “The amount of conversations and interest people had in Naijaweb was a great motivation. Imagine getting featured on Techpoint Africa; it motivated me to do this.”

Besides, he was motivated by failure as well, because prior to his starting YarnGPT, he had applied for a job at a Nigerian AI company but did not perform as well in the interview as he had expected.

YarnGPT became the project that would help him improve his skills and increase his chances of securing such roles in the future.

Building an AI model that sounds Nigerian required gathering a vast amount of Nigerian voices.

“I used some movies that were available online. I extracted their audio and subtitles. The problem with building in Nigeria is data. Replicating what has been built overseas isn’t that hard, but data always gets in the way,” he explained.

For instance, Nollywood produces over 2,500 movies a year, and with many filmmakers uploading their work to YouTube, this gives him a lot of data to work with but that was not to be, as the opposite happened to be the case.

While there are thousands of movies for him to choose from the audio wasn’t up to the standard he wanted, and their subtitles were inaccurate. To compensate, Azeez turned to Hugging Face, an open-source platform for machine learning and data science.

He combined the audio from Nigerian movies with high-quality datasets from Hugging Face to train his model.

The next step was training the AI model, but without access to his own GPU, he had to rely on cloud computing services like Google Colab. This cost him $50 (₦80,000) a significant amount for a university student. Unfortunately, it was a waste.

How Azeez, Unilag’s student built a Nigerian accent AI text-to-speech model

Bank stocks drive market’s N884bn gain in week ended Feb. 7

Frequent electricity tariff hikes hurt manufacturing growth - MAN

How Azeez, Unilag’s student built a Nigerian accent AI text-to-speech model

Africa’s safest country offers warm escape with stunning beaches and wildlife

10 countries Nigerian passport holders can visit without spending much

Seven wealth-building strategies millionaires follow in silence