Data Privacy in the Age of AI
With the emergence of artificial intelligence systems like ChatGPT and CoPilot, data privacy has emerged as a pressing issue for individuals and society.
Nearly everything we do online is tracked to a certain extent. Even home appliances, from smart thermostats to streaming TVs, are sending data about your choices and purchases into the wider digital world. Companies collect that data and sometimes sell it to third parties known as data brokers.
The most powerful AI systems require massive amounts of data in order to function. Often that data is indiscriminately scraped from the internet. It may contain personal photos you uploaded to a social media site. It may even contain your individual genetic code.
The personal data you upload to the digital world has always had a certain value, but that value has been dramatically heightened due to the need for organic (human-created) data as a component of AI training datasets.
Make no mistake: Your personal data has value. OpenAI, the maker of ChatGPT, was founded in 2015. Its market value now sits at $157 billion. The social media site Reddit went public at a valuation of $6.4 billion in early 2024 based largely on the perceived AI training value of its data—millions of organic posts, comments, and quips contributed for free by its users and redditors.
Personal data can make our interactions in the digital world faster and more convenient. It can also be used against you. Scammers are now using AI systems, for instance, to mimic the voice of a trusted relative in a bid to unlock their victim’s financial assistance. Feeding known personal information to their victim plays a key role in building false trust.
The prompts, text, images, and documents you feed into AI systems also constitute a kind of personal or private data. Once you upload it into an AI system, it’s ingested by that system and becomes another piece of the machine’s training data. In other words, if you upload your company’s proprietary information into a public AI system, that information becomes part of a public database. The same goes for any personal data uploaded into an AI chatbot. The AI developer or deployer may have installed safeguards to prevent the regurgitation of that data, but there’s no guarantee that the chatbot won’t blurt your sensitive data hours, days, or years later.
Next: