header logo
Asia asset finance
Mogo Academy
Science & Tech
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal
May 14, 202612:35 PM
AI researchers achieve landmark breakthrough in Sinhala AI with publication in prestigious IEEE Access journal

While global tech giants spend billions training artificial intelligence that still struggles in Sinhala, a Sri Lankan research team has quietly built a model that does it properly — using just two GPUs.

 

In a blind evaluation, the new Sinhala language model scored 4.5 out of 5, compared with just 1 out of 5 for the base Meta Llama 3.1 model on the same Sinhala prompts. The team also cut the model’s perplexity — a standard measure of how well an AI understands a language — by close to 90 percent, according to a statement. 

 

In practical terms, the model can hold a natural conversation in Sinhala, answer questions, follow instructions, and stay coherent across long responses.

 

The research has been accepted for publication in IEEE Access, a peer-reviewed open-access journal of the Institute of Electrical and Electronics Engineers (IEEE), with a Journal Impact Factor of 3.6 in the 2024 Clarivate Journal Citation Reports and an h5-index above 200 on Google Scholar Metrics. 

 

IEEE Access operates a binary review policy — reviewers either accept or reject a manuscript in the form submitted, with no revision cycle — a quality bar few low-resource-language AI papers have cleared.

 

Why this matters for Sri Lanka

 

Sinhala is spoken by more than 20 million people, but it is barely represented in the training data of the AI systems everyone is now talking about. Ask ChatGPT, Claude, or Gemini something in Sinhala and the answers sometimes tend to be broken, repetitive, or nonsense.

 

The deeper issue is sovereignty. Even when foreign AI tools do work in Sinhala, Sri Lanka has no control over them — the model weights, the training data, the safety rules, and ultimately the off-switch all sit with companies in the United States or China, the statement said. 

 

For a country where most government, healthcare, and education conversations happen in Sinhala, depending entirely on AI built and operated abroad is a structural risk over data privacy, national security, cultural framing, and basic continuity of service when foreign policy, pricing, or licensing shifts.

 

A sovereign Sinhala LLM changes that equation. It can be hosted locally, audited locally, fine-tuned for Sri Lankan contexts, and continue to operate regardless of what any foreign tech company decides next — opening the door to Sinhala-speaking AI assistants for government services, educational tools for Sinhala-medium students, healthcare information for elderly and rural users, accessibility tools for citizens who do not speak English, and natural-sounding customer service for local businesses.

 

Built on a tight budget

 

Major AI labs in the United States use thousands of GPUs and spend hundreds of millions of dollars to train comparable systems. This team did it with two GPUs over a few weeks of training, and had to build its datasets from scratch because no large, clean Sinhala corpus existed. 

 

The team scraped Sinhala news sites, books, and online sources, and used Hindi datasets as a starting point — Hindi and Sinhala share Indo-Aryan roots — to build a final dataset of around 3.6 million question-answer pairs and 4 billion tokens, one of the largest public Sinhala AI datasets, now freely available on Hugging Face, the statement said.

 

The team also redesigned how the model reads Sinhala. The original Llama tokenizer needed an average of 91 tokens per Sinhala sentence and failed on 97.5 percent of Sinhala characters at the byte level. After adding around 35,000 Sinhala-specific tokens, that dropped to 23 tokens per sentence and zero byte-level failures.

 

Who built it

 

The project was conducted at the Department of Electrical Engineering, University of Moratuwa, led by Sanjeewa Alwis, CEO of Decryptogen; Dr. Chathura Wanigasekara (Senior Member, IEEE) of the Institute of Maritime Technologies and Propulsion Systems at the German Aerospace Centre (DLR), Geesthacht; and Dr. Logeeshan Velmanickam (Member, IEEE), Senior Lecturer at the Department. Dr. Wanigasekara and Dr. Logeeshan are the corresponding authors.

 

The core engineering work — model training, dataset construction, tokenizer redesign, and evaluation — was carried out by P. K. Udith I. Sandaruwan, Nimesh M. A. Fonseka, and Pamith C. Salwathura (Student Member, IEEE), all University of Moratuwa graduates, working in collaboration with the Decryptogen R&D team. They earlier presented a preliminary version of the work at the IEEE AIIoT Congress in Seattle in 2025; the IEEE Access paper is the full, finalized version.

 

Sanjeewa Alwis has lead  Decryptogen into an international operation across Europe, the United States, and Australia, with a focus on decentralized large language model training and blockchain-integrated AI. He has long argued that emerging regions need to build their own sovereign AI capacity rather than wait for foreign tech companies to include them, it added.

 

What comes next

 

Next steps include longer training runs, larger and more diverse Sinhala datasets, and deployments in assistive technologies and conversational systems for Sinhala speakers. The full paper, “End-to-End Adaptation of LLMs for Low-Resource Languages,” will appear in IEEE Access under DOI 10.1109/ACCESS.2026.3693119. The datasets are publicly available on Hugging Face.

 

 

 

MostRead
Mobitel Upahara
VideoStories
Colombo EV Motor Show 2026 officially inaugurated

Colombo EV Motor Show 2026 officially inaugurated

Future fuel pricing adjustments under review; Govt aims to provide maximum benefits for consumers

Future fuel pricing adjustments under review; Govt aims to provide maximum benefits for consumers

"No mention of Gotabaya’s name in Easter Attack Comm. reports" Court told during petition hearing

"No mention of Gotabaya’s name in Easter Attack Comm. reports" Court told during petition hearing

Chinese Embassy donates school supplies and dry rations following request by MP Dilith Jayaweera

Chinese Embassy donates school supplies and dry rations following request by MP Dilith Jayaweera

Chikungunya cases surface in addition to dengue outbreak; Public urged to destroy breeding sites

Chikungunya cases surface in addition to dengue outbreak; Public urged to destroy breeding sites

Chaos and uproar in Parliament after Speaker rejects request to debate issues within judicial sector

Chaos and uproar in Parliament after Speaker rejects request to debate issues within judicial sector

Govt. moves to amend laws to remove provisions on marking voters using indelible ink at elections

Govt. moves to amend laws to remove provisions on marking voters using indelible ink at elections

“This is injustice!”: Family appeals to Pope to intervene over continuous detention of Suresh Sallay

“This is injustice!”: Family appeals to Pope to intervene over continuous detention of Suresh Sallay

Dengue cases increasing at an alarming pace; IDH, Kalubowila and Galle hospitals reach capacity

Dengue cases increasing at an alarming pace; IDH, Kalubowila and Galle hospitals reach capacity

Sri Lanka's health system at risk due to surge in dengue cases; PHIs intensify crackdown

Sri Lanka's health system at risk due to surge in dengue cases; PHIs intensify crackdown

SJB-UNP coalition on the cards; Sajith calls for reduction in fuel prices, electricity tariffs

SJB-UNP coalition on the cards; Sajith calls for reduction in fuel prices, electricity tariffs

“Safeguard Suresh Sallay’s health and legal rights” Global Sri Lankan Forum writes to President

“Safeguard Suresh Sallay’s health and legal rights” Global Sri Lankan Forum writes to President

“Sri Lanka's core issue is weak financial literacy” -Derana Deputy Chairman Laksiri Wickramage

“Sri Lanka's core issue is weak financial literacy” -Derana Deputy Chairman Laksiri Wickramage

Sarath Weerasekara writes to ARFRO on Suresh Sallay's behalf

Sarath Weerasekara writes to ARFRO on Suresh Sallay's behalf

Government puts up theatrics to avoid farmers' concerns – MP Dilith Jayaweera

Government puts up theatrics to avoid farmers' concerns – MP Dilith Jayaweera

Lassana Flora