header logo
Science & Tech
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal
May 14, 202612:35 PM
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal

While global tech giants spend billions training artificial intelligence that still struggles in Sinhala, a Sri Lankan research team has quietly built a model that does it properly — using just two GPUs.

 

In a blind evaluation, the new Sinhala language model scored 4.5 out of 5, compared with just 1 out of 5 for the base Meta Llama 3.1 model on the same Sinhala prompts. The team also cut the model’s perplexity — a standard measure of how well an AI understands a language — by close to 90 percent, according to a statement. 

 

In practical terms, the model can hold a natural conversation in Sinhala, answer questions, follow instructions, and stay coherent across long responses.

 

The research has been accepted for publication in IEEE Access, a peer-reviewed open-access journal of the Institute of Electrical and Electronics Engineers (IEEE), with a Journal Impact Factor of 3.6 in the 2024 Clarivate Journal Citation Reports and an h5-index above 200 on Google Scholar Metrics. 

 

IEEE Access operates a binary review policy — reviewers either accept or reject a manuscript in the form submitted, with no revision cycle — a quality bar few low-resource-language AI papers have cleared.

 

Why this matters for Sri Lanka

 

Sinhala is spoken by more than 20 million people, but it is barely represented in the training data of the AI systems everyone is now talking about. Ask ChatGPT, Claude, or Gemini something in Sinhala and the answers sometimes tend to be broken, repetitive, or nonsense.

 

The deeper issue is sovereignty. Even when foreign AI tools do work in Sinhala, Sri Lanka has no control over them — the model weights, the training data, the safety rules, and ultimately the off-switch all sit with companies in the United States or China, the statement said. 

 

For a country where most government, healthcare, and education conversations happen in Sinhala, depending entirely on AI built and operated abroad is a structural risk over data privacy, national security, cultural framing, and basic continuity of service when foreign policy, pricing, or licensing shifts.

 

A sovereign Sinhala LLM changes that equation. It can be hosted locally, audited locally, fine-tuned for Sri Lankan contexts, and continue to operate regardless of what any foreign tech company decides next — opening the door to Sinhala-speaking AI assistants for government services, educational tools for Sinhala-medium students, healthcare information for elderly and rural users, accessibility tools for citizens who do not speak English, and natural-sounding customer service for local businesses.

 

Built on a tight budget

 

Major AI labs in the United States use thousands of GPUs and spend hundreds of millions of dollars to train comparable systems. This team did it with two GPUs over a few weeks of training, and had to build its datasets from scratch because no large, clean Sinhala corpus existed. 

 

The team scraped Sinhala news sites, books, and online sources, and used Hindi datasets as a starting point — Hindi and Sinhala share Indo-Aryan roots — to build a final dataset of around 3.6 million question-answer pairs and 4 billion tokens, one of the largest public Sinhala AI datasets, now freely available on Hugging Face, the statement said.

 

The team also redesigned how the model reads Sinhala. The original Llama tokenizer needed an average of 91 tokens per Sinhala sentence and failed on 97.5 percent of Sinhala characters at the byte level. After adding around 35,000 Sinhala-specific tokens, that dropped to 23 tokens per sentence and zero byte-level failures.

 

Who built it

 

The project was conducted at the Department of Electrical Engineering, University of Moratuwa, led by Sanjeewa Alwis, CEO of Decryptogen; Dr. Chathura Wanigasekara (Senior Member, IEEE) of the Institute of Maritime Technologies and Propulsion Systems at the German Aerospace Centre (DLR), Geesthacht; and Dr. Logeeshan Velmanickam (Member, IEEE), Senior Lecturer at the Department. Dr. Wanigasekara and Dr. Logeeshan are the corresponding authors.

 

The core engineering work — model training, dataset construction, tokenizer redesign, and evaluation — was carried out by P. K. Udith I. Sandaruwan, Nimesh M. A. Fonseka, and Pamith C. Salwathura (Student Member, IEEE), all University of Moratuwa graduates, working in collaboration with the Decryptogen R&D team. They earlier presented a preliminary version of the work at the IEEE AIIoT Congress in Seattle in 2025; the IEEE Access paper is the full, finalized version.

 

Sanjeewa Alwis has lead  Decryptogen into an international operation across Europe, the United States, and Australia, with a focus on decentralized large language model training and blockchain-integrated AI. He has long argued that emerging regions need to build their own sovereign AI capacity rather than wait for foreign tech companies to include them, it added.

 

What comes next

 

Next steps include longer training runs, larger and more diverse Sinhala datasets, and deployments in assistive technologies and conversational systems for Sinhala speakers. The full paper, “End-to-End Adaptation of LLMs for Low-Resource Languages,” will appear in IEEE Access under DOI 10.1109/ACCESS.2026.3693119. The datasets are publicly available on Hugging Face.

 

 

 

MostRead
Mob
VideoStories
“Preserving the Himalayas, a global responsibility”  Nepal Embassy celebrates Int'l Sagarmatha Day

“Preserving the Himalayas, a global responsibility” Nepal Embassy celebrates Int'l Sagarmatha Day

“Sri Lanka is facing a massive economic crisis” Sajith urges govt. to act decisively (English)

“Sri Lanka is facing a massive economic crisis” Sajith urges govt. to act decisively (English)

"Fuel price hike relatively minor" Consumption must be reduced to prevent further hikes – Minister

"Fuel price hike relatively minor" Consumption must be reduced to prevent further hikes – Minister

Colombo streets illuminated with lanterns and pandols  as Sri Lankans celebrated Vesak festival

Colombo streets illuminated with lanterns and pandols as Sri Lankans celebrated Vesak festival

Derana’s 24-hour ‘Vesak Dansala’ concludes successfully; 40,000 devotees participate in the event

Derana’s 24-hour ‘Vesak Dansala’ concludes successfully; 40,000 devotees participate in the event

“SL must be prepared to implement reforms”; GSP+ facility key for exporters to access EU market

“SL must be prepared to implement reforms”; GSP+ facility key for exporters to access EU market

Fuel prices increased again; Bus operators demand 5% fare hike, Tuk fare also likely to go up

Fuel prices increased again; Bus operators demand 5% fare hike, Tuk fare also likely to go up

Massive turnout as Derana’s 24-hour Vesak Dansala kicks off at Independence Square in Colombo

Massive turnout as Derana’s 24-hour Vesak Dansala kicks off at Independence Square in Colombo

Sri Lankans commemorate Vesak Poya; devotees invoke blessings island-wide

Sri Lankans commemorate Vesak Poya; devotees invoke blessings island-wide

Sarvajana Balaya hosts a dansala in Borella under the patronage of MP Dilith Jayaweera

Sarvajana Balaya hosts a dansala in Borella under the patronage of MP Dilith Jayaweera

Sri Lankans to celebrate Vesak Festival tomorrow with pandals, Dansal and lanterns

Sri Lankans to celebrate Vesak Festival tomorrow with pandals, Dansal and lanterns

Sri Lanka offers a ‘Full Package’ for investors – PM Harini says

Sri Lanka offers a ‘Full Package’ for investors – PM Harini says

Derana’s 24-hour Vesak Dansala to commence tomorrow at Independence Square in Colombo

Derana’s 24-hour Vesak Dansala to commence tomorrow at Independence Square in Colombo

MPs who change political parties do not deserve parliamentary seats – Dy Minister

MPs who change political parties do not deserve parliamentary seats – Dy Minister

Opposition accuses Bribery Comm's DG of overstepping authority, pursuing political agenda

Opposition accuses Bribery Comm's DG of overstepping authority, pursuing political agenda

Ada Derana Sinhala