Training Data

MEDIUM fear General Audience
The massive collection of text, images, or audio used to teach an AI system how to perform its tasks.

In Plain English

Training data is the textbook an AI studies before it takes a test. If you want an AI to write like a human, you feed it millions of books, articles, and websites. The AI is only as good as the data it learns from; if the data is flawed or biased, the AI will be too. For example, an AI trained only on 19th-century literature would speak like a Victorian novelist.

Real-World Example

The millions of internet articles OpenAI fed into ChatGPT so it could learn how humans communicate.

← Back to Full Glossary