LEarn
This is a resource repository for legislators, policy makers, journalists, thought leaders, and researchers. Artificial intelligence can be confusing and overwhelming. We aim to provide clarity and understanding. The modules, articles, and guides presented here are intended to explain fundamental concepts in artificial intelligence and AI governance in accurate and non-technical language. New articles will be added as the technology and language of AI evolve—and they’re evolving quickly.
To learn more about the Transparency Coalition’s top remedies for current risks in AI safety and transparency, see our Solutions page.
topics
Read our Latest AI Report
TCAI Advisor Leigh Wickell unearths the roots of today’s outdated privacy laws, and sets a course for an AI-era update
AI 101
Your startup guide on the most essential concepts for understanding Artificial Intelligence.
AI Safeguards
Exploring the foundations of AI safeguards and mitigation.
Data Privacy 101
The essentials of Personally Identifiable Information, data privacy, and why it matters.
TCAI report
Privacy Harms in the AI Age takes an in-depth look at America’s outdated privacy laws and offers solutions for the emerging AI landscape.
Select image at left to download the full report.
Training Data transparency
Learn about the foundational ingredients of AI models, and why and how they should be disclosed.
DISCLOSING AI USE
Understand the importance of AI disclosure laws, and how content provenance makes disclosure possible.
Synthetic Data 101
Learn about the difference between organic data and synthetic data, and how it affects AI performance.
Complete Resource Library
Understanding Synthetic Data
In today’s AI ecosystem there are two general types of training data: organic and synthetic.
Organic data describes information generated by actual humans, whether that’s a piece of writing, a numerical dataset, a song, an image, or a video. Synthetic data is created by generative AI models using organic data as a base material.
Synthetic Data and AI ‘Model Collapse’
Just as a photocopy of a photocopy can drift away from the original, when generative AI is trained on its own synthetic data, its output can also drift away from reality, growing further apart from the organic data that it was intended to imitate.
Transparency and Synthetic Data
The use of synthetic data isn’t inherently good or bad. In medical research, for example, it’s a critically important tool that allows scientists to make new discoveries while protecting the privacy of individual patients.
At the Transparency Coalition, we are not calling for limitations on the creation or use of synthetic data. What’s needed is disclosure: Developers should be transparent in their use of synthetic data when using it to train an AI model.