Disclosure norms in data law likely to create barriers for generative AI

Enterprise AI solution firms are reviewing the Bill while closely watching the discussions of the transition period

Data Protection, cybersecurity, digitisation, security

4 min read Last Updated : Aug 14 2023 | 3:34 PM IST

Generative artificial intelligence (AI) models like ChatGPT and Google’s Bard AI may find themselves at odds with India’s newly-enacted data privacy law. This is because it requires every platform to disclose personal data held by them and needs mandatory user consent to continue processing it.

The Digital Personal Data Protection Bill, 2023, received the President’s nod on Friday. The Act has been notified in the official gazette after six years of efforts, consultations, and iterations.

Though large language models like ChatGPT and Bard AI largely rely on publicly-available data, the information shared by users in their prompts is also fed into the system to improve the AI’s ability to provide the most accurate output.

This may create compliance issues for such AI models, as the new privacy law requires every platform to disclose the personal data already acquired by them. On top of that, the platforms may require explicit consent from users to continue processing personal data.

“If you are an AI platform, unless you say that I am going to take your data and use it for my large language models and the person consents to it, you cannot touch the personal data. The research purposes (under Section 17) are not to be mixed up with AI at all,” Rajeev Chandrasekhar, Union minister of state for electronics and IT told Business Standard in an interview.

The minister added that the exemptions provided for research under Section 17 (2) (b) refer to statistical research. This, the government may undertake for the efficacy of its programmes and related data analytics.

“We have said that after the Bill comes into force, every platform has to first of all, disclose whatever personal data they have. It is for the citizen to decide whether he wants to keep that data with him. We will give a transition period for sure,” Chandrasekhar said.

The algorithms of generative AI models are trained with massive amounts of data. Launched in 2020, OpenAI’s GPT-3 was trained on 45 terabytes (TB) of text data to produce creative outputs. According to experts, the segregation of personal data from such a complex database is likely to be a very critical process.

“Generally, AI systems process datasets that are fed as it is and unless programmed otherwise, there would be no clear distinction between the processing of personal and non-personal data. Since the Bill also creates no such distinction and has no provisions distinguishing between personal and non-personal data processing for AI, there could be privacy issues associated with this,” said Salman Waris, managing partner at technology law firm TechLegis Advocates & Solicitors.

According to the Bill, the central government may notify certain platforms as significant data fiduciaries. It would depend on the volume and sensitivity of personal data, risks to the rights of users, impact on the sovereignty and integrity of India, among others.

Such platforms will need to appoint a data protection officer, as well as an independent data auditor to evaluate compliance with the provisions of the Act.

“Large language models and AI-based systems may be classified as significant data fiduciaries. Besides the above measures, they will also have to undertake other steps, including data protection impact assessment and periodical audits, to comply with the legislation,” Waris said.

Enterprise AI solution firms are reviewing the Bill while closely watching the discussions of the transition period.

Beerud Sheth, the cofounder and chief executive officer (CEO) of Gupshup, said the requirements of the Bill may not cause a severe disruption in most of the use cases for the firm. Gupshup provides cloud messaging and conversational experiences to enterprise customers, including leading banks in India.

“By and large, in a majority of use cases, large language models scrape the world wide web with basic content and they use it for training models. The most important distinction is between data and personal data. So, this doesn’t impact the training of large language models for the most part. Now, there might be some models that predict other things. And, those models may need some consent from users,” Sheth said.

“Our systems already had mechanisms to promptly erase data upon request. Naturally, we will conduct a thorough review of the act to identify potential areas that require additional attention. This could involve establishing more comprehensive protocols for promptly reporting any instances of personal data breaches to both the Data Protection Board and affected users,” said Pawan Prabhat Co-Founder of Shorthills AI, the provider of AI development services for business use cases.