In a joint effort to enhance natural language processing for Hindi and its dialects, Intel, Tech Mahindra, and Bud Ecosystem have collaborated on Project Indus, a new open-source large language model (LLM). The project focuses on addressing the challenges of natural language generation and processing for India’s diverse linguistic landscape. The benchmark study of Project Indus highlights its performance, scalability, and potential for practical applications.

India is home to a rich array of languages and dialects, with 1,645 dialects and 27 official languages. Hindi, the most widely spoken language in the country, has over 600 million speakers, yet current LLMs often struggle with the variations in Hindi dialects. To fill this gap, Tech Mahindra’s Makers Lab developed Project Indus, a model that specifically supports Hindi and 37 of its dialects.

Nikhil Malhotra, Global Head – Makers Lab, Tech Mahindra, said, “Project Indus is our seminal effort to develop an LLM from the ground up. Through Makers Lab, our R&D arm, we created a roadmap, collected data from the Hindi-speaking population, and built the Indus model.”

“Our collaboration on Project Indus represents a significant step forward in bridging the linguistic divide in India. By leveraging cutting-edge Intel technologies and Bud Runtime’s optimizations, we’ve developed a powerful, scalable solution that not only enhances natural language processing for Hindi and its dialects but also paves the way for more efficient, sustainable AI applications.” – said Jithin V.G, CEO, Bud Ecosystem

The development of Project Indus was informed by a significant data collection effort, producing the world’s largest dataset for Hindi and its dialects, totalling 100 GB. The model, with 1.2 billion parameters and 22 billion tokens, was created with a budget of $150,000 and a team of 15 people. It was trained using a cluster of 48 A100 GPUs provided by the Centre for Development of Advanced Computing (CDAC).

For inference, the model leverages Intel® Xeon® processors from the 5th generation, with a custom version developed in collaboration with Bud Ecosystem. Bud Runtime, an optimization tool for large-scale applications, was integrated with the inference stack to improve performance. The use of Intel® Advanced Matrix Extensions (Intel® AMX) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) helps to accelerate the model’s inference capabilities. Additionally, Non-Uniform Memory Access (NUMA) architecture was employed to optimize memory handling and parallel processing efficiency.

A comprehensive benchmarking study of the Indus LLM was conducted using the LLMPerf library, evaluating key performance metrics such as Time to First Token (TTFT), inter-token delay, throughput, and latency under various conditions. The results show that the model can process tokens at an average rate of 33.9984 tokens per second, with response times ranging from 0.249 to 4.27 seconds for processing 22 to 167 tokens. The average end-to-end latency was recorded at 3.07 seconds.

One of the key performance indicators, TTFT, was evaluated at different concurrent request levels. For 100 concurrent requests across 2 NUMA nodes, the TTFT was 8.03 seconds, while the TTFT for the 90th percentile at 200 concurrent requests increased to 17.18 seconds. Despite this increase, the model’s performance remains on par with GPU-based inference, demonstrating its efficiency and scalability. The Indus model also maintains robust throughput even under high request volumes, with scalability performance improving as more NUMA nodes are added.

Bud Ecosystem’s collaboration with Intel for the development of Project Indus has led to several key advantages in performance optimization. Bud Runtime ensures high throughput and concurrency, which is essential for handling large-scale applications. Additionally, the solution supports both ‘Across the Box Scale Out’ and ‘In-Box Scale Out’ approaches, further improving scalability and addressing memory-related challenges.

Overall, the benchmarking results show that the Indus LLM is a promising tool for processing Hindi and its dialects with competitive performance, making it suitable for deployment in real-world applications. The project not only advances the capabilities of natural language processing in India but also highlights the potential for scalable, efficient LLM solutions on Intel hardware.

Company

Product

Resources