April 24, 2024 · Technology & Intellectual Property

ChatGPT and Your Data Privacy

Generative AI scrambles the premise of privacy law, leaving users with strong rights over their own prompts and weak rights over data swept up at training.

Every prompt typed into ChatGPT is, in legal terms, a disclosure of personal information to a private company, and the rules governing what happens next are still being written. A user who pastes a draft contract, a medical question, or a colleague’s name into the chat box is handing data to OpenAI under a privacy policy that reserves broad rights to retain and learn from that input. At the same time, the model itself was trained on a corpus scraped largely from the open web, which means it may already hold fragments of information about people who never used the product at all. Those two facts — data flowing in from users and data swept up at training — generate most of the privacy questions now working their way through regulators and courts on both sides of the Atlantic.

What ChatGPT collects, and from whom

OpenAI’s consumer service gathers two distinct categories of personal information. The first is account and content data: the email and device details tied to a login, and the text of conversations themselves. By default, OpenAI retains conversation history and, for personal accounts, has stated that user inputs may be used to improve its models unless the account holder turns that setting off. The second category is far larger and more contested — the training corpus, assembled by scraping publicly available text from across the internet. Because that corpus was harvested wholesale, it can contain names, biographical details, and other personal data about individuals who have no relationship with OpenAI and never agreed to anything.

This second category is what makes generative AI awkward to fit inside existing privacy frameworks. Most data-protection law is built around a relationship between a person and an entity that knowingly collects their information. A model trained on scraped text scrambles that premise: the data subject is often unaware the collection occurred, and the information may surface only as a probabilistic output rather than as a stored record that can be located and deleted.

The European challenge: a legal basis problem

The most concrete enforcement pressure has come from Europe. In March 2023 Italy’s data-protection authority, the Garante, ordered a temporary halt to ChatGPT in the country, citing the absence of a lawful basis for processing personal data to train the model and the lack of adequate age controls. Access was restored after OpenAI added disclosures and user controls, but the underlying investigation continued, and in early 2024 the Garante notified the company of alleged breaches of the General Data Protection Regulation (Regulation (EU) 2016/679). The European Data Protection Board convened a dedicated ChatGPT task force to coordinate national authorities, a signal that the questions are being treated as structural rather than local.

The recurring sticking point is GDPR’s requirement that every processing activity rest on one of the regulation’s enumerated legal bases. OpenAI has leaned on “legitimate interests” to justify training on scraped data, but that basis demands a balancing test against the rights of data subjects — a difficult argument when those subjects were never notified. Equally unresolved is how the regulation’s accuracy principle and its right to erasure apply to a system that does not store discrete records but generates statements on the fly, sometimes false ones, about identifiable people.

The American picture: scraping suits and the FTC

In the United States there is no single federal privacy statute analogous to the GDPR, so the early disputes have been piecemeal. A proposed class action filed in the Northern District of California in mid-2023 alleged that OpenAI and Microsoft had conducted an “unprecedented” web-scraping operation in violation of a stack of federal and state privacy and anti-hacking laws. That complaint was voluntarily dismissed in September 2023, with the plaintiffs reserving the right to refile, so it produced no merits ruling. It nonetheless previewed the theories — invasion of privacy, unjust enrichment, and statutory claims — that later filings have continued to test.

The Federal Trade Commission opened its own inquiry in July 2023, sending OpenAI a civil investigative demand that probed the company’s data-handling practices, its disclosed 2023 security incident, and ChatGPT’s capacity to generate false or disparaging statements about real individuals — a consumer-protection framing that treats reputational harm and inaccuracy as potential “unfair or deceptive” practices under Section 5 of the FTC Act.

The investigation reflects the agency’s broader position that existing consumer-protection authority already reaches AI products, even without new legislation.

A note on the patchwork

The absence of a comprehensive federal privacy law means a California resident, a Texas resident, and an Illinois resident may have meaningfully different rights against the same product. State statutes — California’s Consumer Privacy Act, Illinois’s Biometric Information Privacy Act, and the comprehensive laws now in force across more than a dozen states — supply much of the live exposure, and they do not align neatly.

State law and the consumer’s own data

For users in states with comprehensive privacy statutes, the clearest leverage is over their own inputs rather than the training corpus. The California Consumer Privacy Act, as amended by the California Privacy Rights Act, gives residents rights to know what personal information a business has collected, to delete it, and to opt out of certain sharing — rights that, at least in principle, reach the contents of a user’s conversations. Whether those rights extend to personal data about a non-user that the model absorbed during training is a harder and largely untested question, because the statute’s deletion machinery assumes a locatable record.

The practical takeaway for an individual user is narrower than the headlines suggest. A person can usually control whether their own chats are retained and used for training, but they have little ability to scrub themselves from a model already trained on the public web. That asymmetry — strong rights over fresh inputs, weak rights over historic scraping — runs through both the European and American disputes. Readers following how courts treat data captured without consent in adjacent contexts may find the analysis in this publication’s coverage of geofence warrants a useful contrast, since both turn on whether a sweep of data about uninvolved people can be squared with existing legal categories.

Practical controls and their limits

OpenAI has rolled out a set of user-facing controls that map onto these legal pressures. Account holders can disable the setting that lets their conversations train future models, delete individual chats or their entire history, and use a temporary-chat mode that keeps a session out of history and out of the training pipeline. Deleted content is generally removed from active systems within a stated window, though some data may persist longer for security and abuse-monitoring purposes. Business and enterprise tiers carry stronger defaults, typically excluding customer inputs from training altogether.

These controls address the inbound flow of user data, and they are meaningful for anyone worried about sensitive prompts. They do nothing, however, about the model’s pre-existing knowledge or about the possibility that an output reproduces personal information drawn from the training set. The gap between what a toggle can fix and what the underlying architecture makes difficult is precisely where the unresolved law lives. Workers entering employer or client information into these tools should also weigh confidentiality obligations that sit outside privacy law altogether — a tension that overlaps with questions raised in this publication’s analysis of employee privacy in remote work.

Where this is heading

The likely trajectory is convergence on a few hard questions rather than a single clarifying ruling. Regulators on both continents are pressing the same issues — lawful basis for training, accuracy of outputs about real people, and the meaningfulness of deletion rights against a generative system — and the answers will shape not just ChatGPT but every model trained the same way. For now, the most defensible posture for an individual is to treat any prompt as a disclosure that may be retained, to use the available controls deliberately, and to assume that the broader questions of consent and erasure remain open. The publication offers commentary and analysis on these developments, not legal advice; anyone facing a concrete data-protection dispute should consult counsel in the relevant jurisdiction. Continuing developments are tracked in the case tracker, and further commentary appears under commentary.

Questions readers ask

Does OpenAI use my ChatGPT conversations to train its models?

By default, OpenAI has stated that conversations on personal accounts may be used to improve its models. Account holders can disable this in the data-control settings, and business and enterprise plans generally exclude customer inputs from training by default.

Can I delete my data from ChatGPT?

You can delete individual conversations or your entire history, and OpenAI has said deleted content is removed from active systems within a stated window. Some data may be retained longer for security and abuse monitoring. Deleting your chats does not remove information about you that the model absorbed during its original training on public web data.

Is ChatGPT legal under Europe’s GDPR?

That question is unresolved. Italy’s Garante found an inadequate legal basis for training on personal data and notified OpenAI of alleged GDPR breaches, and a European Data Protection Board task force has coordinated national inquiries. No definitive cross-border ruling on lawfulness had been issued as of this writing.

What was the Italian ChatGPT ban about?

In March 2023 the Garante ordered a temporary halt to ChatGPT in Italy, citing the absence of a lawful basis for processing training data and the lack of age controls. Service resumed after OpenAI added disclosures and user controls, but the investigation continued.

Has the FTC taken action against OpenAI?

The Federal Trade Commission opened an investigation in July 2023, issuing a civil investigative demand that examined OpenAI’s data practices, a 2023 security incident, and ChatGPT’s capacity to generate false statements about real people. An investigation is an inquiry, not a finding of wrongdoing.

Did anyone sue OpenAI over web scraping?

A proposed class action filed in the Northern District of California in 2023 alleged unlawful web scraping to build the training data. The plaintiffs voluntarily dismissed it later that year while reserving the right to refile, so it produced no ruling on the merits.

Can ChatGPT reveal personal information about me even if I never used it?

Because the model was trained on text scraped from the public web, it can hold and surface fragments of information about people who never used the product. Whether and how individuals can compel removal of that information is one of the central unsettled questions in the field.

What rights do California residents have over their ChatGPT data?

Under the California Consumer Privacy Act, as amended by the California Privacy Rights Act, residents have rights to know, delete, and opt out of certain sharing of personal information collected by a business. How far those rights reach into a model’s training corpus, as opposed to a user’s own inputs, has not been settled.

Is using temporary chat mode actually private?

Temporary chats are kept out of history and out of the training pipeline and are deleted from systems within a stated window. They reduce, but do not eliminate, data exposure, since some retention for security and abuse monitoring may still apply.

Should I enter confidential work information into ChatGPT?

Separate from privacy law, entering employer or client data may breach confidentiality or contractual obligations. The safest assumption is that any input could be retained, so sensitive material should generally be kept out of consumer AI tools unless an organization’s policy and the relevant plan’s terms clearly permit it.

Diane M. Calloway

Contributing Editor · Constitutional Law

Diane M. Calloway writes on the Fourth Amendment, digital privacy, and appellate procedure. A former appellate clerk, she follows how courts apply older search-and-seizure doctrine to new surveillance technology.