In August, I had the opportunity to attend an AI journalism lab alongside many like-minded journalists. During the two-day training, we met an official from the National University of Sciences and Technology (NUST) who explained how the university is working with the government and a telecom company to build Pakistan’s indigenous large language model (LLM).
One reporter asked: why do we need our own LLM when we already have great models in the market? My understanding so far is that we are building our localised or fine-tuned LLM using open-source foundations, such as Meta's Llama, the parent company of Facebook. And before I get back to the reporter’s question, it is time for a flashback.
Earlier this year, during my Deep Learning course, we had an assignment where we were asked to create a model that could recognise an individual’s gender, age and race based on the photo provided. It is a classic model with YouTube full of the code sheet (for those who want to try). Anyway, when I fed my public photo to the model, it came up with a hilarious result: Male, 10 and White. Beyond the initial laughter that the response elicited, we identified several factors that helped justify the weak response. One, I had pigtails, which may have confused the model. Second, the data set that we used had a few photos of people from the South Asian region, making the model unable to study my features. When a model is not shown what a particular group of people look like, it is bound to make mistakes. If models have not seen the data or if their interaction with a particular set of data is minimal, the responses cannot be accurate. In our race recognition model, there were over 10,000 photos of white people. For South Asians, the number was close to 2,000 (if I could recall correctly). Naturally, the model did not have many versions of South Asian people.
Now I can go back to the reporter’s question. That millions of people are using AI and that companies are pouring billions of dollars into AI and that Amazon’s chief Jeff Bezos has even recommended building data centres in space all show that AI is not going anywhere. We can snigger at it all we want, but these models are here to stay – and will be very much integrated in our appliances and in our apps. Models require data, and sourcing accurate data to train the model is not always easy.
A few months back, while preparing my presentation for a training for media students, I consciously used AI for a few photos to show the students where the model’s responses lack. I asked the model to generate a photo of a Pakistani woman working in a newsroom. To my surprise, in most photos, the Pakistani woman was wearing a sari and heavy jewellery. Then, I asked the model to generate a photo of Karachi in 2050 with skyscrapers, a beautiful tram and flying cars. I also asked it to show the city as pedestrian-friendly. When it showed the image of people walking around, I noticed one thing: almost all of them were wearing skullcaps; some had a waistcoat on; and some were wearing thobes (a dress style more popular in the Middle East) – hardly how people in Karachi are seen. When I asked it to generate a photo of a rickshaw, it came up with images of pulled rickshaws that are more common in Bangladesh and India, or tourist spots in Malaysia.
When we advocate for our LLM, we aim to build models that reflect our culture and know of things that are popular/common in our country. Recently, Meta announced Meta AI in Urdu for users in Pakistan. As of now, many AI tools automatically transcribe spoken Urdu in Hindi. (Disclaimer: I am not against the Hindi language, but since most users here cannot read Hindi, the automatic transcription appears less useful for people.) Some journalist-specific apps do not have Urdu as the language option, so when an audio file is transcribed, the results are in Hindi. And while that text can be translated to English through chatbots, it is difficult to verify if the model was able to hear the speakers’ words correctly. So, when Meta announces that it has an Urdu version too, it signals the company’s understanding of the needs of Pakistani users.
During the training process, it is also essential for the experts to have an active role. There is a growing trend on social media platforms where creators joke about the depiction of Urdu speakers in Bollywood. The words that are still being shown in those movies reflect the poor understanding of movie creators of Pakistan’s culture. It is only when Indians interact with Pakistanis on a shared platform that they realise that what is depicted in movies is different from what they are in reality. The same applies to AI models. As the reporter mentioned during our training, the AI model can write a couplet in Urdu, which may be better than some Urdu speakers, but if we were to look closely at that verse, we would find that it is either mimicry or heavily influenced by poets of the past. Models require constant fine-tuning, and for that, they need data that also reflects the evolution of the language. A fine-tuned model could also better add cultural references in its responses.
I agree with the reporter that LLMs’ responses are improving. It is also quite good at understanding Pakistan’s meme culture, but much of the country’s discourse happens offline, and a localised LLM could easily capture the nuances that are often missing from online discourse. Languages are also user-specific. When a user asks something from an LLM and instructs it to remember that and override its learned information, the LLM will behave accordingly. But if users do not have access to have accurate conversations with LLMs, they risk being shut out. Since AI is now ubiquitous, it is important for it to adopt as many cultural flavours as possible. An LLM of our own would open up opportunities for our people in this fast-growing age of AI.
The writer heads the Business Desk at The News. She tweets/posts @manie_sid and can be reached at: aimen_erum@hotmail.com