MBZUAI Collaborates with Inception to Launch Kazakh Language AI Model

Inception, a technology firm, has unveiled SHERKALA, a Large Language Model designed specifically for the Kazakh language. This model was developed in partnership with the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and Cerebras, and it has undergone extensive evaluation against human-curated benchmarks that focus on Kazakh culture, geography, and history to ensure high accuracy in understanding and generating content.

MBZUAI Collaborates with Inception to Launch Kazakh Language AI Model
Credit: Khaleej Times

Dr. Larry Murray, Vice President of Applied Science at Inception, emphasized that SHERKALA acts as a scalable and localized AI solution. He stated that it demonstrates the potential of AI to support linguistic diversity and expand access to technology. The development of this model is rooted in a commitment to inclusivity in AI, addressing the technological gaps faced by languages that have historically been underrepresented.

With over 13 million speakers, the Kazakh language has often lacked representation in large-scale AI models, which limits access to effective AI-driven applications. SHERKALA aims to bridge this gap by utilizing state-of-the-art linguistic adaptation techniques and being trained on a massive dataset of 45 billion words, primarily in Kazakh, while also incorporating elements from English, Russian, and Turkish.

Inception’s broader AI strategy focuses on creating solutions that enhance accessibility and equity across global linguistic landscapes. SHERKALA aligns with this vision and joins other models in their portfolio, such as JAIS for Arabic and NANDA for Hindi. The model benefits from continuous pretraining from Llama 3.1, featuring a 25% expanded tokenizer, which enhances Kazakh language processing efficiency to match that of top-tier English models.

SHERKALA has been meticulously trained on a curated dataset, allowing it to perform comparably to larger models in terms of efficiency and accuracy. It has been benchmarked against human-curated tests to ensure superior comprehension and contextual intelligence. The model was trained on Condor Galaxy, one of the most advanced AI supercomputers, achieving high computational efficiency without sacrificing precision in training and inference.

Various sectors in Kazakhstan are expected to benefit from SHERKALA’s capabilities. In education, it can enhance digital learning and provide automated tutoring and translation support. Government and public services may see improvements in multilingual communication and citizen engagement, while the finance and legal sectors can utilize it for document processing and contract analysis.

Looking ahead, Inception aims to expand its AI model portfolio to support additional languages that remain underserved in the AI ecosystem. Opportunities are being explored in regions such as Central Asia, Africa, and Southeast Asia. The goal is to enhance multilingual interoperability, allowing for seamless AI interactions across languages while preserving linguistic integrity.

Dr. Murray highlighted the necessity for a structured and ethical approach to AI adoption in multilingual societies. He urged governments to support open-source AI development, ensuring transparency and fairness, while businesses should adopt inclusive AI solutions. Collaboration between the public and private sectors will be essential in promoting responsible data practices and ensuring that AI facilitates progress without compromising linguistic diversity. SHERKALA exemplifies how AI can be utilized responsibly to empower communities and foster a more inclusive future.

Leave a Reply

Your email address will not be published.