Data Privacy: What It Is, Why It’s Important

With the emergence of artificial intelligence systems like ChatGPT and CoPilot, data privacy has emerged as one of the most urgent consumer protection issues of the 2020s. 

Nearly everything we do online is tracked to a certain extent. Even home appliances, from smart thermostats to streaming TVs, are sending data about your choices and purchases into the wider digital world. Corporations collect that data, and sometimes they sell it to third parties known as data brokers. 

The most powerful AI systems require massive amounts of data in order to function. Often that data is indiscriminately scraped from the internet. It may contain personal photos you uploaded to a social media site. It may even contain your individual genetic code

The personal data you upload to the digital world has always had a certain value, but that value has been dramatically heightened due to the need for organic (human-created) data as a component of AI training datasets. 

Make no mistake: Your personal data has value. OpenAI, the maker of ChatGPT, was founded in 2015. Its market value now sits at $157 billion. The social media site Reddit went public at a valuation of $6.4 billion in early 2024 based largely on the perceived AI training value of its data—millions of organic posts, comments, and quips contributed for free by its users and redditors. 

Personal data can also be used against you. Malicious actors can gather your data together to use for nefarious purposes. Scammers are now using AI systems, for instance, to mimic the voice of a trusted relative in a bid to unlock their victim’s financial assistance. Feeding known personal information to their victim plays a key role in building false trust. 

The prompts, text, images, and documents you feed into AI systems also constitutes a kind of personal or private data. Once you upload it into the AI system, it’s ingested by that system and becomes another piece of the machine’s training data. In other words, if you upload your company’s proprietary information into a public AI system, that information becomes part of a public database. The same goes for any personal data uploaded into an AI chatbot. The AI developer or deployer may have installed safeguards to prevent the regurgitation of that data, but there’s no guarantee that the chatbot won’t blurt your sensitive data hours, days, or years later.

Next:

Previous
Previous

Legislating the Disclosure of AI Use

Next
Next

AI Safeguards: Where to Start