What I Found in a Database Meta Uses to Train Generative AI
Nobel-winning authors, Dungeons and Dragons, Christian literature, and erotica all serve as datapoints for the machine.
This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. It is now at the center of several lawsuits brought against Meta by writers who claim that its use amounts to copyright infringement… Read the full article here.