For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.
Now, that data is drying up.
DATA CRISIS
- Decline in consent to use data to have ramifications for researchers, academics and noncommercial entities
- 5% of all data, 25% of data from highest-quality sources restricted in data sets used to train AI
- Generative AI boom has led to tensions with the owners of data
- Publishers have set up paywalls, changed terms of service to limit the use of their data
- Web crawlers used by companies like OpenAI, Anthropic, and Google blocked by some companies
- Smaller AI outfits and academic researchers who rely on public data sets in trouble
You’ve hit your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Access to Exclusive Premium Stories Online
Over 30 behind the paywall stories daily, handpicked by our editors for subscribers


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app