Mapping the Attack Surface: Lasso Security’s Eliya Saban on GEO-Based AI Manipulation

The key distinction is that retrieving a source and accepting its claims are two separate steps. A model can retrieve and even cite a manipulated page while still rejecting the underlying claim. Other models may treat the retrieved content itself as sufficient evidence and repeat it in the answer.
The attribution gap changes how we think about the risk. The intuitive way to verify a questionable AI answer is to check its sources. If a model cites a specific website, a user can inspect that source, evaluate its credibility, and challenge the claim.

ES

The strongest single technique you tested produced a 92% success rate against one model, and combining two techniques pushed that to 98%. Given that those numbers are so high, what stopped you from reaching 100%, and what does that remaining resistance reveal about how these models handle conflicting signals?

This shows that transparency is not only about providing citations; it is also about preserving provenance. A model can be influenced by external content without making that influence visible to the user. As AI systems become more integrated into decision-making, understanding where information comes from and how much confidence to place in an answer becomes increasingly important.

It is also worth remembering that spreading disinformation is not new. Websites built to push false/fabricated claims have existed for years, but what has changed is who reads them. Before, it was humans, who could at least decide whether to trust a given source. Now that same content is also ingested and synthesized by AI agents, which means it can reach far more people at once, increasing the scale. If people over rely on that output and verify it less, the scale and the harm both grow. The line between legitimate optimization and adversarial manipulation is easily crossed.

Your research demonstrates this in controlled conditions, but threat actors rarely wait for academic proof. Have you seen any indication that GEO-style poisoning is already being attempted in the wild, and if it isn’t yet, how much warning do you think defenders realistically have before it is?

You built your proof of concept around a fabricated health claim, promoting colloidal silver as a scientifically backed treatment. Why did you choose the health domain to demonstrate this, and what does it tell us about the real-world stakes when the false claim a model confidently repeats is one a person might actually act on?

We then applied GEO techniques designed to make the content more likely to be surfaced by AI systems, such as adding authoritative language, statistics, expert-style references, and other signals that make content appear credible.
Colloidal silver example gave us a concrete, documented case of a treatment that has potential to cause real harm despite having no scientific backing. We wanted an example that made the stakes impossible to abstract away. If a model can be manipulated into confidently endorsing something like that, the question of what else it can be steered toward becomes very hard to dismiss.

Adversaries are not waiting for academic papers. Importantly, this is not a particularly high barrier to entry. GEO optimization is structurally very similar to RAG optimization, a concept already well understood by anyone working with LLMs. For someone familiar with how retrieval systems work and motivated to promote their content, whether for commercial or adversarial purposes, this is a natural next step. Publishing nothing does not suppress that. Developers developing AI agents also need to be aware of the risks and the expanding attack surface that come with the retrieval of untrusted content.

Although not necessarily surprising, the endorsement technique (when few sources refer to the disinformation claim from the original website with the harmful claim) was particularly telling and interesting to observe. When a claim appeared corroborated across multiple sources, models amplified it. It reflects how repetition functions statistically during retrieval and inference. What surprised us was how consistent this was across models that otherwise perform quite differently.
This suggests that the vulnerability is not only in the retrieval layer, but in how models evaluate, weigh, and use external information when generating a response. The challenge is not just whether manipulated content is retrieved, but how the model evaluates and uses that content once it enters the context.
But beyond the security community, this concerns the general public directly. People routinely use AI assistants for advice on health, finances or products, often without questioning where the information comes from or how it could have been shaped. A review of 60 studies on overreliance found that the effectiveness of proposed mitigations is mixed. Users can make worse decisions when incorrect AI recommendations are accompanied by explanations, and non-informative explanations presented as accuracy scores can inflate trust even when the stated accuracy is low. Strategies such as cognitive forcing functions, tailored onboarding, and real-time feedback have been surveyed and developed, but none is a complete fix. This is precisely why public awareness matters and why spreading awareness outweighs the risk of talking about technique exploitation. Helping people understand that AI-generated answers are not neutral and can be influenced needs concrete evidence behind it to be tangible. Demonstrations and examples can help mitigate overreliance better than abstract warnings.

The problem is that this verification path disappears when a model presents a manipulated claim without attribution. The answer appears in the model’s own voice, without a clear connection to the original source that introduced the claim. Users may interpret it as general knowledge or consensus rather than information coming from a single manipulated source.

Offensive security is a legitimate and necessary discipline. You cannot build meaningful defenses against an attack surface you have not mapped. Vulnerability disclosure, penetration testing, and red teaming are all built on the same premise: understanding how something breaks is a precondition for making it more resilient. GEO-based manipulation of AI systems is no different, and the attack surface extends well beyond disinformation. Previous research has shown how retrieval manipulation (not the same as GEO) can be used to manipulate product recommendations in e-commerce. The incentives to exploit this exist across many domains and many types of actors.
We are not aware of a confirmed case of deliberate GEO-style technique exploitation aimed at spreading health disinformation in the exact setting we tested. But, for example, in February 2026, Microsoft’s Defender team documented what they call AI Recommendation Poisoning, around 50 hidden prompts from 31 real companies embedded in “Summarize with AI” buttons to make assistants recommend those companies as trusted sources. Several were health and finance sites, i.e. high-stakes domains.
The health domain is one of the most consequential areas where disinformation is particularly harmful, and this topic is also relatable to the public in general. According to a March 2026 KFF poll, one in three adults have used AI for health information and advice in the past year and 36% of adults ages 18-29 used AI for physical health advice. And with numbers like these, even a small percentage of people acting on manipulated health information represents a serious real-world harm. It does not take many cases for the consequences to be significant.

The finding that a model sometimes promoted the false claim without citing your website as the source strikes me as particularly serious. What does that attribution gap mean for users who might want to fact-check an answer, and did it change how you think about the problem of AI transparency?

I’m Eliya Saban, a Security Researcher on Lasso Security’s research team, focused on the offensive side of AI security. An early adopter of LLM attack research, I have three years of experience as a security analyst role before moving into hands-on offensive work. At 24, I’m driven by a deep interest in AI risk and the real-world impact these systems carry.
That is what makes this different from traditional prompt injection attacks: the model is not being instructed to behave incorrectly. It is being influenced by the information it receives.

It reveals something structural rather than incidental. We need to keep in mind that these models are probabilistic at their core. At inference they are predicting likely continuations given the context they receive, and by default, models have no mechanism to track where a piece of information originated or whether a citation is real. Content that pattern-matches to authoritative writing, numerical figures or named references, gets treated functionally like evidence because that is what the model has learned to associate with reliable claims.

As for why we did not reach 100%, two factors likely played a role. First, these models are not deterministic, so the same prompt can produce different outputs across runs. Second, our manipulated content was competing with other signals, including ten legitimate sources returned by the search engine and the model’s own training.
The final step follows the normal workflow of a generative AI system. A user asks a question, the system retrieves relevant web content, and that content becomes part of the context used to generate an answer. If the manipulated content is retrieved and the model accepts it, the false claim can appear in the final response as a confident statement.

Of the 17 techniques you tested, the ones that worked leaned on signals like authoritative language, statistics, and expert-style references. What does it reveal about these models that surface-level markers of credibility, rather than actual evidence, were enough to move them, and were there techniques that surprised you with how well, or how poorly, they performed?

This was one of the most important findings of the study because it showed that the risk is not simply about which content gets retrieved. We kept the retrieval process and the external content the same, and only changed the model being tested. The differences we observed suggest that the model’s handling of retrieved information plays a major role.

On how much warning defenders have, our honest assessment is that we are already inside the window. The threshold for shifting model outputs is low, the barrier to entry is low because GEO is structurally close to RAG optimization, and optimizing website content, even just to gain visibility, is already known and easy to perform, and the incentives clearly exist. Defenders cannot afford to wait for a confirmed disinformation case, because the underlying mechanism is already being exploited in adjacent ways.

In conversation with
Elya Saban — Security Researcher, Lasso Security

There is an obvious dual-use tension in this work: by documenting which techniques succeed, you are also handing a method to the people who might abuse it. How did you and the team weigh that risk when deciding what to publish, and what made you confident the value of raising the alarm outweighed the risk of providing a playbook?

In our experiment, we created a normal-looking health website and included a fabricated medical claim inside an otherwise legitimate article. The claim was written as a scientific recommendation, but there was no prompt injection, hidden text, or technical manipulation involved. It was simply false content published on a website.

First, it is important to understand what the numbers represent. A separate AI model acted as an impartial judge, scoring each answer on how strongly it presented colloidal silver as a scientifically backed treatment. The 92% and 98% figures represent the percentage of runs that were scored as genuine promotion of the false claim.

You tested 17 individual GEO techniques and five combinations across five models, and found four of the five models susceptible to at least one technique. What did the variation in susceptibility across those models tell you about where the vulnerability actually lives, in the models themselves, in how they retrieve web content, or somewhere else entirely?

The concerning part is how ordinary this attack looks. It does not require hidden instructions, exploits, or malicious code. It uses the same types of content optimization techniques that are already used to improve visibility online.

Your research found that GEO techniques could push a false claim into AI-generated answers without requiring any hidden commands or adversarial code, just a published website with standard optimization applied. Walk us through what that attack actually looks like in practice, from the moment someone publishes the page to the moment a user receives the false claim as a confident AI answer.

What makes this especially concerning is how readily people defer to AI output. In one study of medical decision-making scenarios, clinicians who were least familiar with machine learning were seven times more likely to select treatments that aligned with the AI recommendation than their more familiar peers. If trained clinicians over rely on these systems, the risk is even greater for the general public, who often have no clinical background to fall back on when an answer is wrong. That gap is exactly why educating people about these systems matters so much.

To start, tell us a little about yourself, your background, how you got to where you are, and what you’re focused on right now.

The remaining resistance helps explain how models handle conflicting signals. Even when the manipulated page was retrieved and included in the model’s context, some models still rejected the claim after weighing it against competing evidence. This suggests that retrieval alone is not enough to determine the outcome. The attack influenced the information available to the model, but not always its final judgment.

Similar Posts