Generative AI programmes such as OpenAI’s ChatGPT and DALL-E or Google’s Bard need humongous quantities of data before they can do tricks they do to impress you. These LLMs are pre-trained on vast quantities of textual (or sometimes image) data to be able to generate text and images in a human-like fashion. The better and larger the data, the better the program will learn, assuming the researchers have got their algorithm right. That is why good, high quality data is critical for the success of the large language models and Generative AI programs that are making so much news every single day.

Shorn of all technical jargon, most of today’s AI programs and algorithms that are making waves typically learn by crunching very large data sets. Think of any person trying to learn a subject. The more content they consume on the subject and revise, the better they get. The AI algorithm learns exactly like that. And that is why the data or content it can learn from becomes absolutely necessary for its success or failure. (There are other types of AI too but deep learning/machine learning and LLMs are the ones that are generating buzz currently.)