Disclosure norms in data law likely to create barriers for generative AI

Enterprise AI solution firms are reviewing the Bill while closely watching the discussions of the transition period

Data Protection, cybersecurity, digitisation, security
Sourabh Lele New Delhi
4 min read Last Updated : Aug 14 2023 | 3:34 PM IST
Generative artificial intelligence (AI) models like ChatGPT and Google’s Bard AI may find themselves at odds with India’s newly-enacted data privacy law. This is because it requires every platform to disclose personal data held by them and needs mandatory user consent to continue processing it.

The Digital Personal Data Protection Bill, 2023, received the President’s nod on Friday. The Act has been notified in the official gazette after six years of efforts, consultations, and iterations.

Though large language models like ChatGPT and Bard AI largely rely on publicly-available data, the information shared by users in their prompts is also fed into the system to improve the AI’s ability to provide the most accurate output.

This may create compliance issues for such AI models, as the new privacy law requires every platform to disclose the personal data already acquired by them. On top of that, the platforms may require explicit consent from users to continue processing personal data.

“If you are an AI platform, unless you say that I am going to take your data and use it for my large language models and the person consents to it, you cannot touch the personal data. The research purposes (under Section 17) are not to be mixed up with AI at all,” Rajeev Chandrasekhar, Union minister of state for electronics and IT told Business Standard in an interview.

The minister added that the exemptions provided for research under Section 17 (2) (b) refer to statistical research. This, the government may undertake for the efficacy of its programmes and related data analytics.

“We have said that after the Bill comes into force, every platform has to first of all, disclose whatever personal data they have. It is for the citizen to decide whether he wants to keep that data with him. We will give a transition period for sure,” Chandrasekhar said.

The algorithms of generative AI models are trained with massive amounts of data. Launched in 2020, OpenAI’s GPT-3 was trained on 45 terabytes (TB) of text data to produce creative outputs. According to experts, the segregation of personal data from such a complex database is likely to be a very critical process. 

“Generally, AI systems process datasets that are fed as it is and unless programmed otherwise, there would be no clear distinction between the processing of personal and non-personal data. Since the Bill also creates no such distinction and has no provisions distinguishing between personal and non-personal data processing for AI, there could be privacy issues associated with this,” said Salman Waris, managing partner at technology law firm TechLegis Advocates & Solicitors.

According to the Bill, the central government may notify certain platforms as significant data fiduciaries. It would depend on the volume and sensitivity of personal data, risks to the rights of users, impact on the sovereignty and integrity of India, among others.

Such platforms will need to appoint a data protection officer, as well as an independent data auditor to evaluate compliance with the provisions of the Act.

“Large language models and AI-based systems may be classified as significant data fiduciaries. Besides the above measures, they will also have to undertake other steps, including data protection impact assessment and periodical audits, to comply with the legislation,” Waris said.

Enterprise AI solution firms are reviewing the Bill while closely watching the discussions of the transition period.

Beerud Sheth, the cofounder and chief executive officer (CEO) of Gupshup, said the requirements of the Bill may not cause a severe disruption in most of the use cases for the firm. Gupshup provides cloud messaging and conversational experiences to enterprise customers, including leading banks in India.
 
“By and large, in a majority of use cases, large language models scrape the world wide web with basic content and they use it for training models. The most important distinction is between data and personal data. So, this doesn’t impact the training of large language models for the most part. Now, there might be some models that predict other things. And, those models may need some consent from users,” Sheth said.

“Our systems already had mechanisms to promptly erase data upon request. Naturally, we will conduct a thorough review of the act to identify potential areas that require additional attention. This could involve establishing more comprehensive protocols for promptly reporting any instances of personal data breaches to both the Data Protection Board and affected users,” said Pawan Prabhat Co-Founder of Shorthills AI, the provider of AI development services for business use cases.

One subscription. Two world-class reads.

Already subscribed? Log in

Subscribe to read the full story →
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

Topics :Artificial intelligenceBill on personal data collectionChatbotCompaniesData protection Bill

Next Story