Below is a single-source walk-through of the full “data life-cycle” for a ChatGPT conversation, stitched together only from OpenAI’s own public research, product-security notes, and policy text released up to March 2025.
1. What exactly is collected at the moment you hit Send
Layer | Concrete fields captured | Where it is described |
---|
Raw content | • Every token of text you type or dictate (speech is auto-transcribed) • Files, images, code snippets you attach | Privacy Policy §1 “User Content” openai.com |
Technical & session metadata | IP-derived coarse location, device/browser IDs, time-stamp, token counts, model-version, latency, language-detected, abuse-filter scores | Privacy Policy §1 “Log Data”, “Usage Data”, “Device Information”, “Location Information” openai.com |
Automated classifier outputs | Safety filters (self-harm, sexual, violence, privacy) plus 25 affect-cue classifiers (loneliness, dependence, etc.) introduced in the EmoClassifiers V1 research pipeline | Affective-Use study §2 cdn.openai.com |
Optional memory | “Saved memories” you explicitly ask for and implicit “chat-history” features that mine earlier sessions for useful facts about you | Memory & Controls blog, April 10 2025 update openai.com |
User feedback | 👍/👎 ratings, free-text feedback, or survey answers (e.g., the 4 000-person well-being survey in the study) | Affective-Use study §1 cdn.openai.com |
- Encryption in transit and at rest (TLS 1.2+ / AES-256).
- Tiered data stores
- Hot path: recent chats + 30-day abuse logs for fast retrieval and safety response.
- Warm path: account-bound conversation history and memories (no scheduled purge).
- Research snapshots: de-identified copies used for model tuning and studies.
These structures are implied across the Enterprise Privacy FAQ (“encryption”, “authorized employee access only”) openai.com and the main Privacy Policy (“we may aggregate or de-identify”) openai.com.
3. Who can see the data, and under what controls
Audience | Scope & purpose | Control gates |
---|
Automated pipelines | Real-time safety filters, usage-analytics jobs, and the Emo-classifier batch that ran across 3 million conversations with no human review cdn.openai.com | N-oft internal tokens; no raw text leaves the cluster |
OpenAI staff | • Abuse triage (30-day window) • Engineering debugging (case-by-case) • IRB-approved research teams (only de-identified extracts) | Role-based access; SOC-2 controls; audit logs openai.com |
Enterprise / Team admins | Chat logs and audit API within the customer workspace | Admin-set retention and SAML SSO openai.com |
No third-party ad networks | Policy states OpenAI does not sell or share Personal Data for behavioural ads openai.com | |
4. Retention timelines (consumer vs. business vs. API)
Product tier | Default retention | User / admin override |
---|
ChatGPT (Free/Plus/Pro) | Indefinite for normal chats; 30 days for “Temporary Chats” | Turn off “Improve the model for everyone” or delete specific chats; memories must be deleted separately help.openai.com openai.com |
ChatGPT Team | End user controls chat retention; deletions purge within 30 days | Workspace admin can shorten window openai.com |
ChatGPT Enterprise / Edu | Admin-defined period; deletes within 30 days on request | Enterprise Compliance API & audit logs openai.com |
OpenAI API | Inputs/outputs kept ≤ 30 days (0 days with “ZDR”) | Developer can request ZDR for eligible workloads openai.com |
Affective-Use research data | De-identified and stored for 24 months under MIT/IRB protocol | PII stripped before storage; no re-identification cdn.openai.com |
5. Longitudinal & emotional profiling
- The 2025 study followed 6 000 “power users” for three months, linking recurring account IDs to evolving affect-classifier scores to show how heavy usage correlates with dependence cdn.openai.com.
- Memory now “references all past conversations” (not just explicit saves), creating a rolling personal knowledge graph openai.com.
- Even after you delete a chat, its classifier metadata may persist in aggregate analytics, and any model weights updated during training are, by design, non-reversible.
6. Practical privacy levers you control today
- Data Controls → “Improve the model for everyone” = Off — stops future chats from joining training sets while keeping history visible help.openai.com.
- Temporary Chat — ephemerally stored, auto-purged after 30 days; never used for training help.openai.com.
- Memory switch — disable both “saved memories” and “chat-history referencing” to prevent profile building openai.com.
- Privacy portal requests — exercise GDPR/CCPA-style rights to access or erase account-linked data openai.com.
- Enterprise route — move sensitive workflows to ChatGPT Enterprise or API ZDR if you need contractual guarantees and shorter retention.
- Emotional traceability: Affect classifiers turn qualitative feelings into numerical fingerprints that can be tracked over months. While the research is aggregated, the pipeline exists inside the product stack.
- Legacy questions: Unless you or your estate delete the account, memories and chats persist and may continue informing model behaviour, indirectly shaping future generations of the system.
- Re-identification risk: De-identified text can sometimes be re-identified when combined with rare personal facts. Limiting granular personal details in prompts is still the safest practice.
- Irreversibility of training: Once training snapshots absorb your words, later deletion requests remove stored text, but the statistical influence on weights remains — similar to shredding a letter after the ideas have been memorised.
Bottom line
OpenAI’s own 2025 research confirms that every conversation creates two parallel artifacts:
- A user-facing transcript + optional memory you can see and delete.
- A metadata shadow (classifier scores, token stats, embeddings) that fuels safety systems, analytics, and long-term studies.
The first is under your direct control; the second is minimised, encrypted, and access-limited — but it is not fully erasable once distilled into aggregate model improvements. Balancing convenience with future privacy therefore means:
- Use memory and chat history deliberately.
- Prefer Temporary Chats or ZDR endpoints for profoundly sensitive content.
- Schedule periodic exports/reviews of what the system still remembers about you.
That approach keeps the upside of a personalised assistant while constraining the parts of the footprint you cannot later reel back in.


Sources