LEarn
This is a resource repository for legislators, policy makers, journalists, thought leaders, and researchers. Artificial intelligence can be confusing and overwhelming. We aim to provide clarity and understanding. The modules, articles, and guides presented here are intended to explain fundamental concepts in artificial intelligence and AI governance in accurate and non-technical language. New articles will be added as the technology and language of AI evolve—and they’re evolving quickly.
To learn more about the Transparency Coalition’s top remedies for current risks in AI safety and transparency, see our Solutions page.
topics
Read our Latest AI Report
TCAI Advisor Leigh Wickell unearths the roots of today’s outdated privacy laws, and sets a course for an AI-era update
AI 101
Your startup guide on the most essential concepts for understanding Artificial Intelligence.
AI Safeguards
Exploring the foundations of AI safeguards and mitigation.
Data Privacy 101
The essentials of Personally Identifiable Information, data privacy, and why it matters.
TCAI report
Privacy Harms in the AI Age takes an in-depth look at America’s outdated privacy laws and offers solutions for the emerging AI landscape.
Select image at left to download the full report.
Training Data transparency
Learn about the foundational ingredients of AI models, and why and how they should be disclosed.
DISCLOSING AI USE
Understand the importance of AI disclosure laws, and how content provenance makes disclosure possible.
Synthetic Data 101
Learn about the difference between organic data and synthetic data, and how it affects AI performance.
Complete Resource Library
Understanding Synthetic Data
In today’s AI ecosystem there are two general types of training data: organic and synthetic.
Organic data describes information generated by actual humans, whether that’s a piece of writing, a numerical dataset, a song, an image, or a video. Synthetic data is created by generative AI models using organic data as a base material.
Synthetic Data and AI ‘Model Collapse’
Just as a photocopy of a photocopy can drift away from the original, when generative AI is trained on its own synthetic data, its output can also drift away from reality, growing further apart from the organic data that it was intended to imitate.
Transparency and Synthetic Data
The use of synthetic data isn’t inherently good or bad. In medical research, for example, it’s a critically important tool that allows scientists to make new discoveries while protecting the privacy of individual patients.
At the Transparency Coalition, we are not calling for limitations on the creation or use of synthetic data. What’s needed is disclosure: Developers should be transparent in their use of synthetic data when using it to train an AI model.
Training Data: What the Machine Learns
Training data is the foundation of artificial intelligence. It’s what AI systems like ChatGPT use to provide answers to the prompts we provide. It’s what generative image systems like Midjourney and DALL-E use to conjure AI-created art.
Why and How to Disclose the Use of AI
With the emergence of generative AI, it now takes just a few button-clicks for anyone to create or manipulate data and convince others that fake content is real.
Why Training Data Is Not a Trade Secret
Training data is the foundation of artificial intelligence. It’s what AI systems like ChatGPT use to provide answers to the prompts we provide. It’s what generative image systems like Midjourney and DALL-E use to conjure AI-created art.
Emerging Standards in Disclosure
Today most media/tech companies are coalescing around the standard created by the Coalition for Content Provenance and Authenticity (C2PA) .
How to Format and Require Data Disclosures
Developers of AI systems should be required to provide documentation for all training data used in the development of an AI model.
Legislating the Disclosure of AI Use
Legislative policies requiring the disclosure of the use of AI are developing side-by-side along the emerging standards in AI provenance. They’re not perfectly in sync—and that’s okay.
Data Privacy: What It Is, Why It’s Important
With the emergence of artificial intelligence systems like ChatGPT and CoPilot, data privacy has emerged as one of the most urgent consumer protection issues of the 2020s.
AI Safeguards: Where to Start
At the Transparency Coalition we believe AI policy discussion and legislative action happen at many levels simultaneously. Our mission is to address known AI safety and privacy risks with practical solutions. We’re focused on bringing transparency to both AI inputs and AI outputs.
Managing Doomsday Scenarios
It’s not difficult to conjure up apocalyptic scenarios set in motion by the advancement of artificial intelligence. Humans have been entertained by techno-catastrophe since Mary Shelley published Frankenstein in 1818.
That’s not to say AI risks should be dismissed as fiction.
Why ‘Opt-in’ Consent Is the Best Option
Most state data privacy laws in the U.S. operate on an opt-out basis, meaning that users are assumed to have consented to the use of cookies unless they actively decline, ie, opt out.
Disclosure: First Step in Data Transparency
Many of the problems caused by AI today are, at their heart, issues of transparency.
The developers of the most popular chatbots, like ChatGPT, refuse to divulge any information about the datasets on which their AI models trained. That denies corporate deployers and individual consumers the ability to judge the quality of the AI system.
Defining Artificial Intelligence
AI computer systems are trained to think, learn, and make decisions independently. While common algorithms follow step-by-step instructions to solve specific tasks, AI systems can analyze data, recognize patterns, and improve their performance over time without explicit programming for every scenario.
How AI Systems Are Created
At its heart, an AI system is a highly sophisticated computer program. That program, known as a model, requires enormous amounts of computing power and massive datasets. By ingesting the datasets, the model “learns” about the structure of language, for instance, or patterns derived from millions of images.
Why the AI Boom Is Happening Now
Sophisticated AI systems like ChatGPT and Copilot require two things: enormous amounts of computing power and massive datasets on which to train. Those assets have only become available in recent years, fueling the industry wide boom.