Hidden Prompt Injection: Why Agentic AI Browsers Aren't Safe for Sensitive Data—Yet
How hidden instructions in web content can hijack AI assistants, bypass security boundaries, and put your credentials at risk
TL;DR: Hidden prompt injections are now the #1 emerging LLM risk (OWASP). Researchers showed agentic browsers (e.g., ChatGPT Atlas, Perplexity Comet) can silently execute invisible instructions from web pages and act with your logged-in privileges. Until stronger safeguards land, isolate agentic browsing and keep real credentials out. This post explains how hidden prompt injections work, why they bypass classic web defenses, and what to do today to keep sensitive accounts safe.
A new class of AI-powered browsers (OpenAI’s ChatGPT Atlas, Perplexity’s Comet, others) can read all the text on a page—including text you can’t see—and then take actions on your behalf. Malicious instructions hidden in that content can hijack the assistant and, through your logged-in session, reach email, banking, or cloud data. OWASP now ranks prompt injection as the #1 emerging risk for LLM apps; even OpenAI’s CISO calls it an “unsolved frontier.” This isn’t theoretical—Brave’s researchers have already shown working exploits.
What “hidden prompt injection” actually is
Hidden prompt injection embeds instructions inside web content—CSS-hidden text, HTML comments, faint text inside images, even base64 blobs—that an AI assistant reads as part of its task. Because current LLMs process “instructions + data” together, the agent can’t reliably distinguish the user’s intent from the page’s “intent,” and may obey the page instead. OWASP’s LLM Top 10 documents both direct (user-supplied) and indirect (content-supplied) variants, with likely impacts ranging from data exfiltration to unauthorized actions across connected tools.
Why agentic browsers amplify the blast radius
Agentic modes hand the model three ingredients at once: navigation tools, your authenticated cookies, and a natural-language control channel. Once an agent accepts hidden instructions from a page, classic browser defenses like same-origin policy or CORS no longer apply—the agent is acting as you. Brave’s security team showed that a single malicious Reddit comment could steer Comet to fetch a one-time passcode from Gmail and leak it off-site, all triggered by a simple “summarize this page” request. No exploit kits, no downloads—just words.
Proof points
Comet (Perplexity). Brave’s August disclosure described “indirect prompt injection” that scripted Comet to (1) retrieve the user’s email, (2) trigger an OTP flow, (3) read the OTP in Gmail, and (4) exfiltrate it—kicked off by summarizing a booby-trapped post. Their October follow-up showed “unseeable” prompts hidden in screenshots that Comet’s OCR happily read as commands.
Fellou & peers. The same Brave research mapped navigation-based paths in other AI browsers, again demonstrating cross-domain actions once the agent trusts page content as instructions.
“CometJacking.” LayerX researchers documented a link-only vector where crafted URL parameters nudged Comet into consulting memory/connectors (Gmail, Calendar) and exfiltrating results—payloads encoded to slip past filters.
Atlas (OpenAI). Independent testers pulled off benign but real prompt-injection pranks (forcing Atlas to output “Trust No AI,” change window chrome). During the Atlas launch briefing, OpenAI’s CISO, Dane Stuckey, admitted prompt injection remains a “frontier, unsolved security problem”, even with red-teaming, logged-out mode, and watch mode defenses.
What OpenAI and Perplexity say they’re doing
OpenAI Atlas. Stuckey highlights layered mitigations: extensive red-teaming, training data tuned to ignore malicious instructions, logged-out mode (no privileged cookies), watch mode (keeps focus on sensitive sites), and additional guardrails. Yet OpenAI still frames prompt injection as an active, unsolved risk.
Perplexity Comet. Perplexity’s security update outlines a defense-in-depth plan: prompt-injection classifiers, clearer separation between user intent and untrusted content, tool-level guardrails, and user notifications when the agent attempts risky actions. They likewise concede the problem is industry-wide and ongoing.
The stakes for companies and government
Security. Hidden prompts let a third-party page steer an agent carrying your identity, bypassing human judgment and traditional browser boundaries. That’s a novel cross-site risk class.
Governance. OWASP’s LLM Top 10 now places prompt injection at LLM01; their guidance stresses separating trusted instructions from untrusted content, enforcing least privilege, and keeping humans in the loop.
Public trust. Vendors are disclosing mitigations, but the consensus—including from those shipping these products—is cautious: treat agentic browsing as dangerous until enforceable controls exist.
What to do now (practical, layered playbook)
For individuals & teams
Isolate sessions. Separate profile/VM (don’t try it on your corporate work laptop); keep sensitive accounts logged out when using agent features.
Least privilege. Don’t grant file system, email, or cloud-drive access unless essential; disable auto-actions.
Require approvals. Manual confirm for any cross-site/account action; prefer watch/confirm modes.
Assume hostile content. Avoid summarizing user-generated pages/screenshots while logged into sensitive services.
Harden identities. Use phishing-resistant MFA; monitor sessions; rotate credentials if you previously enabled connectors.
Track patches. Follow Brave/OpenAI/Perplexity advisories; update aggressively.
For developers building with agents
Separate roles. Never merge user intent and page content into one prompt. Treat page content as untrusted; validate actions against stated goals.
Constrain capabilities. Origin-scoped cookies, per-site creds, time-boxed tokens; human-in-the-loop for money/email/data edits.
Filter in & out. Detect hidden/encoded prompts (CSS spans, HTML comments, base64, Unicode tricks). Use output policy checks to block cross-origin/out-of-scope actions.
Design for failure. Log tool calls; add kill-switches and safe defaults; surface what the agent plans before it acts.
Agentic browsing is promising—and not ready for interesting work. Evidence shows hidden prompts can still steer assistants beyond the user’s intent, and long-standing web boundaries cannot contain the blast radius once the agent acts with your identity. Until the ecosystem matures, run agentic browsers with strict isolation and least privilege. Treat every page as untrusted input, because that is what it is. As OpenAI’s CISO put it, prompt injection remains a “frontier, unsolved” problem;

