Vardhan said he believes that transfer learning technique combined with architectural and algorithmic changes can, to a large extent, replace the need for training models with local datasets.
“However, we need to modify existing architecture to make them easily and quickly adaptable to several unseen domains,” he added.
Challenges of using LLMs for Indic languages
Soket AI’s Upperwal believes that using LLMs like ChatGPT for Indic languages is more expensive because the tokenisation algorithm is biased towards English, making processing these languages less efficient and more computationally demanding.
“Because we have a large amount of data primarily in English on the internet, the algorithm or the system becomes biased towards English,” said Upperwal. “So, at the core, whenever we are tokenising or converting these sentences, hi, how are you, into mathematical form, for English, the compression is highest, but for Indic languages, the compression is very, very bad and because of that, essentially the compute requirement that has to go into generating any sort of response from these models go up and because of that, obviously the cost will go up,” he said.