Dilli struggled to scrape datasets from the internet to build a large language model (LLM), used to power AI chatbots, in his native Hausa language as part of his final project at university.
“I needed texts in English and the corresponding translation in Hausa, but I couldn’t get anything online; (there was) no clean data,” Dilli said.