Transparency Coalition co-founder testifies in favor of California training data bill

Transparency Coalition.AI co-founder Rob Eleveld added his voice to those calling for appropriate AI legislation yesterday, testifying in Sacramento in favor of a bill that would require AI developers to publish documentation of the datasets used to train artificial intelligence models.

AB 2013, a proposal sponsored by California Assemblymember Jacqui Irwin (D-Thousand Oaks), would bring necessary guidance to developers of AI systems, said Eleveld, “because all key stakeholders deserve to understand what is being pulled into these models to derive their outputs.”

Eleveld’s testified at a hearing of the Assembly’s Privacy and Consumer Protection, which approved the bill by a vote of 8-1.

As the foundation from which generative AI systems are built, training data is a primary factor in determining the health, safety, and reliability of generative AI outputs. 

“Consumer confidence in AI systems has not grown at the same rapid pace as industry adoption,” Assemblymember Irwin has noted. “Many consumers have valid questions about how these AI systems and services are created, and if they truly are better than what they seek to replace.”

 “To build consumer confidence,” she added, “we need to start with the foundations, and for AI that is the selection of training data. AB 2013 provides transparency to consumers of AI systems and services by providing important documentation about the data used to train the services and systems they are being offered, including if synthetic data has or is being used to fill gaps in data sources.”

Irwin’s bill would require the developer of an AI model to post, on the developer’s website, documentation regarding the data used to train the model. That information would include:

  • The source or owner of the dataset

  • A description of how the dataset furthers the purpose of the AI system

  • The number of data points included in the dataset

  • Whether the dataset includes information protected by copyright, trademark, or patent

  • Whether the dataset includes personal information

  • Whether the dataset was purchased, licensed, or is in the public domain

  • Whether synthetic data was used in the training of the AI model

Further details and requirements can be found in the bill.

“These are not heavy-handed requirements, nor will they inhibit small business innovation,” Eleveld told the committee. The tranparency requirements in Irwin’s bill, he added, will safeguard California’s citizens “while empowering the tech industry to grow and thrive.”

“Now is the time to act,” said Eleveld. “We cannot repeat the missteps and inaction during the nascent stages of social media.”

 

Previous
Previous

Focus on AI’s real harms, not end-of-humanity hype

Next
Next

How licensing models can be used for AI training data