header logo
Asia asset finance
Mogo Academy
Science & Tech
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal
May 14, 202612:35 PM
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal

While global tech giants spend billions training artificial intelligence that still struggles in Sinhala, a Sri Lankan research team has quietly built a model that does it properly — using just two GPUs.

 

In a blind evaluation, the new Sinhala language model scored 4.5 out of 5, compared with just 1 out of 5 for the base Meta Llama 3.1 model on the same Sinhala prompts. The team also cut the model’s perplexity — a standard measure of how well an AI understands a language — by close to 90 percent, according to a statement. 

 

In practical terms, the model can hold a natural conversation in Sinhala, answer questions, follow instructions, and stay coherent across long responses.

 

The research has been accepted for publication in IEEE Access, a peer-reviewed open-access journal of the Institute of Electrical and Electronics Engineers (IEEE), with a Journal Impact Factor of 3.6 in the 2024 Clarivate Journal Citation Reports and an h5-index above 200 on Google Scholar Metrics. 

 

IEEE Access operates a binary review policy — reviewers either accept or reject a manuscript in the form submitted, with no revision cycle — a quality bar few low-resource-language AI papers have cleared.

 

Why this matters for Sri Lanka

 

Sinhala is spoken by more than 20 million people, but it is barely represented in the training data of the AI systems everyone is now talking about. Ask ChatGPT, Claude, or Gemini something in Sinhala and the answers sometimes tend to be broken, repetitive, or nonsense.

 

The deeper issue is sovereignty. Even when foreign AI tools do work in Sinhala, Sri Lanka has no control over them — the model weights, the training data, the safety rules, and ultimately the off-switch all sit with companies in the United States or China, the statement said. 

 

For a country where most government, healthcare, and education conversations happen in Sinhala, depending entirely on AI built and operated abroad is a structural risk over data privacy, national security, cultural framing, and basic continuity of service when foreign policy, pricing, or licensing shifts.

 

A sovereign Sinhala LLM changes that equation. It can be hosted locally, audited locally, fine-tuned for Sri Lankan contexts, and continue to operate regardless of what any foreign tech company decides next — opening the door to Sinhala-speaking AI assistants for government services, educational tools for Sinhala-medium students, healthcare information for elderly and rural users, accessibility tools for citizens who do not speak English, and natural-sounding customer service for local businesses.

 

Built on a tight budget

 

Major AI labs in the United States use thousands of GPUs and spend hundreds of millions of dollars to train comparable systems. This team did it with two GPUs over a few weeks of training, and had to build its datasets from scratch because no large, clean Sinhala corpus existed. 

 

The team scraped Sinhala news sites, books, and online sources, and used Hindi datasets as a starting point — Hindi and Sinhala share Indo-Aryan roots — to build a final dataset of around 3.6 million question-answer pairs and 4 billion tokens, one of the largest public Sinhala AI datasets, now freely available on Hugging Face, the statement said.

 

The team also redesigned how the model reads Sinhala. The original Llama tokenizer needed an average of 91 tokens per Sinhala sentence and failed on 97.5 percent of Sinhala characters at the byte level. After adding around 35,000 Sinhala-specific tokens, that dropped to 23 tokens per sentence and zero byte-level failures.

 

Who built it

 

The project was conducted at the Department of Electrical Engineering, University of Moratuwa, led by Sanjeewa Alwis, CEO of Decryptogen; Dr. Chathura Wanigasekara (Senior Member, IEEE) of the Institute of Maritime Technologies and Propulsion Systems at the German Aerospace Centre (DLR), Geesthacht; and Dr. Logeeshan Velmanickam (Member, IEEE), Senior Lecturer at the Department. Dr. Wanigasekara and Dr. Logeeshan are the corresponding authors.

 

The core engineering work — model training, dataset construction, tokenizer redesign, and evaluation — was carried out by P. K. Udith I. Sandaruwan, Nimesh M. A. Fonseka, and Pamith C. Salwathura (Student Member, IEEE), all University of Moratuwa graduates, working in collaboration with the Decryptogen R&D team. They earlier presented a preliminary version of the work at the IEEE AIIoT Congress in Seattle in 2025; the IEEE Access paper is the full, finalized version.

 

Sanjeewa Alwis has lead  Decryptogen into an international operation across Europe, the United States, and Australia, with a focus on decentralized large language model training and blockchain-integrated AI. He has long argued that emerging regions need to build their own sovereign AI capacity rather than wait for foreign tech companies to include them, it added.

 

What comes next

 

Next steps include longer training runs, larger and more diverse Sinhala datasets, and deployments in assistive technologies and conversational systems for Sinhala speakers. The full paper, “End-to-End Adaptation of LLMs for Low-Resource Languages,” will appear in IEEE Access under DOI 10.1109/ACCESS.2026.3693119. The datasets are publicly available on Hugging Face.

 

 

 

MostRead
Mobitel 5g
VideoStories
 Cabinet gives nod to secure US$ 200 million in ABD funding for infrastructure and housing projects

Cabinet gives nod to secure US$ 200 million in ABD funding for infrastructure and housing projects

 “Gotabaya’s arrest will be determined by evidence” CID responsible for Easter attacks probe: Govt.

“Gotabaya’s arrest will be determined by evidence” CID responsible for Easter attacks probe: Govt.

 Dengue infections surpass 42,000; Special dengue control program to be held in schools this week

Dengue infections surpass 42,000; Special dengue control program to be held in schools this week

“Sallay not in critical condition” CID declines request to grant daily legal access to Suresh Sallay

“Sallay not in critical condition” CID declines request to grant daily legal access to Suresh Sallay

Sri Lanka records 5.1% GDP growth in 1Q 2026; Agriculture, industry & services sectors expand: CBSL

Sri Lanka records 5.1% GDP growth in 1Q 2026; Agriculture, industry & services sectors expand: CBSL

CID informs court of probe into contempt allegations against six persons including ex-Ministers

CID informs court of probe into contempt allegations against six persons including ex-Ministers

Govt. has obtained US$ 1.85 bln in foreign loans since assuming power, Finance Ministry data shows

Govt. has obtained US$ 1.85 bln in foreign loans since assuming power, Finance Ministry data shows

"Govt. has no development plans" Dilith Jayaweera says common man is being burdened on a daily basis

"Govt. has no development plans" Dilith Jayaweera says common man is being burdened on a daily basis

"Different virus spreading rapidly at the moment" Officials raise alarm over surge in dengue cases

"Different virus spreading rapidly at the moment" Officials raise alarm over surge in dengue cases

El Niño expected to gradually impact Sri Lanka, experts warn

El Niño expected to gradually impact Sri Lanka, experts warn

Dengue cases surge in Sri Lanka with over 40,000 patients recorded this year

Dengue cases surge in Sri Lanka with over 40,000 patients recorded this year

Sri Lanka Navy launches new book showcasing nation’s maritime heritage

Sri Lanka Navy launches new book showcasing nation’s maritime heritage

Special Dengue Prevention Week in Colombo from June 15 to 21

Special Dengue Prevention Week in Colombo from June 15 to 21

“New corruption cases stalled by CIABOC” Joint Opp. seeks meeting with Chairman to raise concerns

“New corruption cases stalled by CIABOC” Joint Opp. seeks meeting with Chairman to raise concerns

Sri Lanka bans export of mineral resources without value addition

Sri Lanka bans export of mineral resources without value addition

Lassana Flora