The government is likely to propose that companies using data from Indian content creators to train their artificial intelligence (AI) systems and large language models (LLMs) share a portion of their global revenues as royalties once these services are commercialised, a senior government official has said.
AI companies will have to disclose the purposes for which they have collected data and how they intend to use it, the official said.
The proposed licensing framework would be a “blanket licence”, where they will be able to use all copyrighted data once the royalty has been paid.
“In the disclosure forms, the proportion (of the content from a particular content provider or providers) will be mentioned. For example, if they say that they have used 30 per cent of the textual content for training, then Copyright Royalties Collective for AI Training (CRCAT) will mark those revenues against the content providers that have copyrighted textual content,” the official said.
Content providers will have to register themselves with the CRCAT to receive their share of the royalty payments done by the companies that have used their content, the official said.
The CRCAT is an industry body proposed by the Department for Promotion of Industry and Internal Trade (DPIIT) as a part of its working paper on the use of copyrighted content by AI companies to train their respective LLMs.
“The payment by CRCAT or other registered societies that work with CRCAT can be done either on a pro-rata basis or a general value assessment since it will be very difficult to determine whose work has contributed how much in the training of the AI,” the
official said.
Once the consultations on the working paper are completed, the commerce ministry may look at amending the copyright law in its current form to include AI royalty payments, the official said.
The DPIIT, which functions under the Ministry of Commerce and Industry, has also proposed that AI companies pay royalties at rates set by the government or courts. It has rejected a voluntary licensing framework, arguing that such an approach would create a “significant compliance burden” due to complex negotiations and uncertainty”.
A government- or a court-fixed rate and the statutory licensing mechanism would, instead, cut costs and create a “predictable environment for licensees of works”.
“While this model takes away the power of the copyright owners to refuse licensing or negotiate a fee, it guarantees them fair compensation,” the DPIIT report has proposed.
A statutory licensing framework will also ensure that the AI models being trained are free of bias or hallucinations, as the maximum amount of content will be available at reasonable rates to all companies that train AI models and LLMs, the working paper has suggested.
Proposed norms
- Based on public comments, govt may decide to amend copyright law to enable AI royalty payment framework
- On payment of govt-prescribed royalty, AI firms can use any publicly available copyright content without seeking consent
- With royalty payments, AI firms to have guaranteed access of data
- No option for copyright holders to opt out of sharing data