Status: Explanation delivered.Below is a concise, engineering-style summary of how Perplexity handles any file type in Google Drive when used as a search source.
Perplexity + Google Drive: How It Searches Any File
1. File Types It Can Search
Perplexity can process any file that Google Drive’s API can deliver as text or a readable binary:
- PDF (text-based & OCR’d)
- EPUB
- Google Docs / Sheets / Slides
- TXT / Markdown
- Word / Excel / PowerPoint
- CSV
- HTML
- Code files
- Images with OCRable text (Drive provides OCR for many images) Binary-only files (e.g., ZIP, executables, audio, video) are not text-searchable, but filenames and metadata still are.
2. How the Search Works (Pipeline)
Step 1 — Live File Fetch
Perplexity performs a real-time API call to Google Drive:
- Gets the file metadata and latest version ID.
- Fetches the actual file content if the file type supports text extraction. No static snapshot is stored.
Step 2 — Text Extraction
Depending on file type:
- Docs/Sheets/Slides → Google exports them as raw text via export endpoints.
- PDF/EPUB/Word/PowerPoint → Perplexity extracts text or uses OCR if available.
- Images → Google’s built-in OCR text is used (if the user enabled OCR).
- Code/Text files → Direct raw text ingestion.
Step 3 — Chunking & Embedding
Perplexity temporarily chunks the text into segments and embeds them to perform:
- Semantic search
- Q&A style retrieval
- Citation mapping Embeddings are cached transiently for performance but do not replace the live source.
Step 4 — Query Execution
Your query runs against the freshly-extracted content.
If you update the file, delete it, or add a new one, the next query reflects the change immediately.
3. Practical Implications
- Yes, it will “scan” your PDF or EPUB in full.
- Yes, it can read long books, manuals, technical files, etc.
- Yes, updates to files are reflected instantly.
- No, it doesn’t rely on outdated snapshots.
- No, you don’t need to “sync” anything manually.