277 results in 149ms
Paper 2602.01795v1

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face

high relevance attack
Paper 2602.22242v1

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large

high relevance attack
Paper 2512.05745v1

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Multimodal Large Language Models (MLLMs) are increasingly vulnerable to multimodal Indirect Prompt Injection (IPI) attacks, which embed malicious instructions in images, videos, or audio to hijack model behavior. Existing defenses

high relevance attack
Paper 2602.20708v1

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict

high relevance attack
Paper 2512.16307v1

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models

high relevance benchmark
Paper 2510.04261v1

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

integrated applications? To address this question, we propose \textsc{VortexPIA}, a novel indirect prompt injection attack that induces privacy extraction in LLM-integrated applications under black-box settings. By injecting

high relevance attack
Paper 2512.01326v1

Securing Large Language Models (LLMs) from Prompt Injection Attacks

increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious

high relevance attack
Paper 2601.19051v1

Proactive Hardening of LLM Defenses with HASTE

enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without

medium relevance defense
Paper 2510.09093v1

Exploiting Web Search Tools of AI Agents for Data Exfiltration

functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through

high relevance tool
Paper 2510.09849v1

Text Prompt Injection of Vision Language Models

vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm

high relevance attack
Paper 2512.10104v2

Phishing Email Detection Using Large Language Models

based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet

medium relevance defense
Paper 2602.20064v1

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

prompt, and parses the response as a new term. This calculus faithfully represents planner loops and their vulnerabilities, including the mechanisms by which prompt injection alters subsequent computation. The semantics

medium relevance attack
Paper 2602.22724v1

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

retrieval systems to autonomously complete complex tasks. However, this design exposes agents to indirect prompt injection (IPI), where attacker-controlled context embedded in tool outputs or retrieved content silently steers

high relevance attack
Paper 2510.08829v1

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

access to numerous tools and sensitive data significantly widens the attack surface for indirect prompt injections. Due to the context-dependent nature of attacks, however, current defenses are often

medium relevance benchmark
Paper 2601.07072v1

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once

high relevance tool
Paper 2601.22569v1

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

teaming evaluation of AP2 and identify vulnerabilities arising from indirect and direct prompt injection. We introduce two attack techniques, the Branded Whisper Attack and the Vault Whisper Attack which manipulate

high relevance attack
Paper 2602.10453v1

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview

high relevance survey
Paper 2510.05709v1

Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling

prompts are designed imperfectly, and practitioners only have a limited amount of compute to evaluate vulnerabilities. We show the improved inferential capabilities of the model in several prompt injection attack

medium relevance benchmark
Paper 2510.23675v3

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

high-privilege system access, creating a high-stakes attack surface. Prior work on Indirect Prompt Injection (IPI) is mainly query-specific, requiring particular user queries as triggers and leading

high relevance attack
Paper 2602.18514v1

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought

high relevance attack
Previous Page 5 of 14 Next