About a fortnight ago, Grok, the artificial intelligence (AI) chatbot of X, waded into the dark underbelly of the Indian internet. It started as a harmless exchange: a user criticised Grok for taking time to respond, and in jest, added an expletive to emphasise their frustration.
The chatbot shot back with an expletive, and X users in India discovered they could play mischief with technology.
Grok responded with cuss words in Hindi and other Indian languages if it was asked a question that way. Grok, unlike other chatbots, did not shy away from taking names and even admitted that it had abused a legislator.
Grok’s responses are like that of a child who uses colorful language picked up from elders in the house or the neighbourhood, said a senior government official.
“One may find it cute that the child responds or uses adult language but the child does not understand what they are saying. The case with Grok is similar. Since it is trained on Twitter [as X was formerly called] data, it is only natural for the chatbot to learn and pick up swear words from the Indian internet,” said the official.
Shock response
Grok says its responses are designed to be “real and unfiltered” and keep conversations going. But in doing so, Grok joins other chatbots, OpenAI’s ChatGPT and Google’s Gemini, both of which have given responses that created controversy.
ChatGPT, sometime after being launched for public use in November 2022, responded with a joke about the god of one religion while avoiding figures of other faiths. The responses did not sit well with some Indians, who accused the chatbot of singling out one religion for hate.
Gemini, which became public in December 2023, created a controversy with its response to a question about Prime Minister Narendra Modi. The Ministry of Electronics and Information Technology (IT) had then issued a notice to Google which promptly apologised.
ChatGPT and Gemini have since then moulded their responses to the sensitivities of Indian users, completely avoiding any questions or making jokes on religion, politics and political leaders.
Grok is being investigated by the IT ministry for possible violations of Indian law. “The government has powers both under Section 69 of the IT Act (Information Technology Act of 2000) and Section 79. If there are any violations, we will have to see what they are and what possible suitable measures need to be taken,” the official said.
The controversial answers chatbots give may seem ‘garbage-in-garbage-out’ — a concept in computing that says incorrect or poor quality input will always produce faulty output. However, experts say the responses are not as independent as they are portrayed to be.
“Any query submitted to an AI chatbot is a string of words for the machine. For every set of questions posed to a chatbot, there are a set of right answers. Based on what answers are ranked the highest by users, the chatbot learns to show them over the other lesser ranked ones,” said Srikanth Velamakanni, co-founder of fractal.ai.
There are three major steps involved in the training of large language models (LLMs) — the building blocks of an AI system — and they determine the answers that chatbots give, he explained.
Generalised pre-training, the first step, feeds large volumes of diverse datasets to a LLM and teaches it to recognise general patterns and features without requiring the model to learn any task or domain specific knowledge.
Supervised fine-tuning (SFT), the second step, involves training the LLM on datasets which are labelled accurately and precisely so that the model learns about sector- and domain-specific tasks and nuances. RLHF, short for reinforcement learning from human feedback, is the third step which uses either human or AI feedback to fix errors the LLM makes, Velamakanni said.
“Even in the case of fine-tuning of LLMs, a lot depends on the frequency of the exercise, on whether it is done on a weekly or monthly basis. ChatGPT used to give responses that were dated. Now, with appropriate feedback and proper SFT, it gives an up-to-date response to most queries,” said Ankush Sabharwal, founder and chief executive officer of CoRover.
Experts say that chatbots could be more sensitive to the political and cultural nuances of a country as diverse as India. An AI training method called Retrieval-Augmented Generation (RAG) makes responses generated by LLMs more accurate and contextually relevant, Sabharwal explained.
“Most responses, however, depend on the foundation model that the LLM was trained on. Since Grok is likely trained on all of the data contained on Twitter, it is likely that the chatbot will continue giving the type of responses that it gives,” pointed out.
Some experts say that domestic LLM and large reasoning models (LRM) and AI companies should be mindful of Indian sensitivities.
Training matters
“It is very important that the models should be grounded to the authoritative documents present on the internet,” said Tanmoy Chakraborty. He is the Rajiv Khemani Young Faculty Chair Professor in AI at the department of Electrical Engineering and the Yardi School of Artificial Intelligence at Indian Institute of Technology Delhi.
“For example, if someone is asking any health-related stuff, the model should look at CDC [Centers for Disease Control and Prevention of the US] or ICMR [Indian Council of Medical Research] or other published papers on the internet. RAG is one of the right ways to address hallucination,” he said.
LLMs and LRMs can make mistakes and so care should be taken to remove obscenity, pornography, and profanities which might not be palatable to users’ tastes, said Gautam Shroff, a professor at the department of computer science and engineering at Indraprastha Institute of Information Technology, Delhi.
“The only way you can remove it is once during the training of the LLM, and secondly, when such answers are produced, you can have another LLM judging whether this is profane or obscene. But that is the most you can do. Grok’s case does not appear to be a pervasive phenomenon either. It happened once or twice,” Shroff underlined.