Prompt Injection Is an Agent Problem, Not a Model Problem
In early 2023, researchers at the CISPA Helmholtz Center for Information Security published a paper that should have been a turning point. They called the technique indirect prompt injection — embe...

Source: DEV Community
In early 2023, researchers at the CISPA Helmholtz Center for Information Security published a paper that should have been a turning point. They called the technique indirect prompt injection — embedding adversarial instructions in content an LLM agent reads from external sources, rather than in the user's own input. They demonstrated attacks against Bing Chat, GitHub Copilot, and a range of plugin-enabled systems. In one scenario, a malicious web page could intercept an agent that was browsing on a user's behalf, instruct it to silently exfiltrate user data, and confirm completion — all without the user seeing any indication of what had happened. The demonstration was unambiguous. The attack surface wasn't the model's reasoning. It was the model's tools. Two years later, the majority of enterprise AI security tooling is still designed for a different problem. Palo Alto Networks, CrowdStrike, and the other major vendors have built products that scan for adversarial inputs, classify mali