One of the challenges faced by AI researchers in India is the limited availability of reliable and accurate public data across the diversity of Indian languages to be able to train such language models. To facilitate such efforts, Prasar Bharati had some time back made its c

While researching for my recent book, Collective Spirit, Concrete Action, focused on Prime Minister Narendra Modi’s Mann Ki Baat, I stumbled on a substantial body of academic papers published in peer reviewed journals by artificial intelligence (AI) researchers across India who relied on the Mann Ki Baat corpus of text to train, test and improve machine learning models. From IIT Kanpur to IIIT Hyderabad, institutions across the country looking to develop artificial intelligence capabilities for natural language processing of Indian languages have found in the Mann Ki Baat corpus a rich and diverse dataset of substantial value in the development of ChatGPT-like language models for India.