What I Found in a Database Meta Uses to Train Generative AI

Nobel-winning authors, Dungeons and Dragons, Christian literature, and erotica all serve as datapoints for the machine.

This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. It is now at the center of several lawsuits brought against Meta by writers who claim that its use amounts to copyright infringement… Read the full article here.

Previous
Previous

Your Personal Information Is Probably Being Used to Train Generative AI Models