Generative AI's Achilles heel

It lies in the contentious battle for control over digital content, with publishers and creators denying AI players access to their work, and countries plugging regulatory gaps

artificial intelligence, Ai
Representative Image
Prosenjit Datta
5 min read Last Updated : Oct 02 2023 | 9:29 PM IST
A few days ago, Google provided publishers a switch that would allow their websites to be available for the search engine crawler but not for training Generative artificial intelligence (AI) models such as Google’s Bard. By using the Google-extended tool, web publishers can control whether their content will be available only for search or for search as well as training the AI models that learn from such content.

Google may have created this tool because it genuinely believes that content creators and publishers should have control over how their data will be used. Or it may have taken that step to avert a prolonged and nasty court battle with human content creators and web publishers from flaring up. Either way, it has highlighted the issue of data and content ownership, which has become extremely contentious since Generative AI became a household term after the splashy entry of OpenAI’s ChatGPT in November last year.

Within a few months of the launch of ChatGPT, several things became very clear. First, ChatGPT and other models, despite the magic they performed, still had many issues that needed to be addressed. They were prone to hallucinations if pressed too hard and for extended periods by users. Their answers deteriorated over time. And they also gave erroneous or fictitious replies many times.

More pertinently, OpenAI had created a web crawler to train its Generative AI models that scraped all available data it could access on the internet without bothering with niceties such as seeking permission from content creators or copyright owners. Its peers in the Generative AI race did much the same.

A group of prominent authors — including John Grisham, George R R Martin, Jonathan Franzen, David Baldacci, and Michael Connelly — have sued OpenAI. Other prominent authors are likely to join the suit, along with content creators, painters and photographers. The issue that will be decided is whether Generative AI models have the right to scrape content on the internet without permission and then generate fresh content by training on content created by humans without obtaining their permission or paying them anything.

Even before Google came up with the Google-extended tool, a number of publishers, including The New York Times and Medium, had taken steps to protect their websites and add tools to prevent their content from being used for training Generative AI models. In India, reports suggest that The Times of India, The Hindustan Times, The Hindu, and Dainik Bhaskar have safeguarded their websites from OpenAI’s web crawler.

For Generative AI’s leading players — OpenAI, Google, Meta, Anthropic, and others — website publishers and authors denying them access to content is not good news. Unless they have a constant stream of good content to feed the Gen AI models, they have spent billions to develop, train, and refine, the latter will start faltering.

Big AI players probably have the technological smarts to bypass the protections and safeguards put up by websites. However, doing so will only harm their legal positions. Offering to pay prominent authors and other content creators as well as big web publishers is another option. The question is whether they can afford to pay for all the content they need. Generative AI models are hungry beasts, and their hunger for fresh content is insatiable.

AI companies had flirted with the idea of “synthetic” data to train models. Synthetic data is created by machines to mimic data generated by humans. Unfortunately, it did not yield the desired results for them.

In all probability, Generative AI’s big guns will eventually agree to compensate prominent publishers and authors, painters, and photographers in order to gain access to new content. They may not do this willingly. They will probably need to be nudged by a combination of new laws as well as court rulings in existing lawsuits. Small, independent content creators are unlikely to gain much though.

A number of countries have started drafting laws and regulations to deal with the issues that have been thrown up by the progress of Generative AI. Europe has moved quickly but so has, surprisingly, China. Others are in various stages of grappling with the nuances that the laws need to address.

Indian policymakers, on the other hand , have been very tardy. The Digital Personal Data Protection Act, 2023, does not address the issues posed by Generative AI. A Digital India Act is being worked on — but what its contours will be and what all it will cover are still not clear. Indian policymakers need to realise that we generate more digital data than any other country, second only to China, and that this is of immense value to builders of Generative AI models. That’s why it’s absolutely imperative to establish clear rules and regulations regarding content and data generated in India, which can be used to train Gen AI models.

The writer is former editor of Business Today and Businessworld and founder of Prosaic View, an editorial consultancy

One subscription. Two world-class reads.

Already subscribed? Log in

Subscribe to read the full story →
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper

Topics :Artificial intelligenceGoogleChatbotTechnology

Next Story