When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent
broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first
Defenses Against Prompt Attacks Learn Surface Heuristics
test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental
From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)
smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing
Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries
threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion
Hidden State Poisoning Attacks against Mamba-based Language Models
also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. Finally, our interpretability study reveals patterns in Mamba's hidden
Hidden State Poisoning Attacks against Mamba-based Language Models
also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend
MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools
agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners
Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service
trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped
The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models
systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained
Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models
Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization
using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric confuses unfaithfulness with incompleteness
Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs
analyzing differential hidden states of response pairs. Then, these clusters are fine-tuned using prompts injected with adversarial tuned prefixes that are optimized to maximize visual neglect, thereby forcing
Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness
remaining subset. Additionally, robustness experiments reveal that while the model is resilient to prompt injection, it is sensitive to synonym substitutions. Our work provides critical insights into the capabilities
Cisco Integrated AI Security and Safety Framework Report
outputs), model and data integrity compromise (e.g., poisoning, supply-chain tampering), runtime manipulations (e.g., prompt injection, tool and agent misuse), and ecosystem risks (e.g., orchestration abuse, multi-agent collusion). Existing
Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy
despite the empirical reality that LLM agents remain unreliable, hallucinated, manipulable, and vulnerable to prompt-injection and tool-abuse. A natural response is "agents-at-stake": binding economically meaningful, slashable
SoK: Trust-Authorization Mismatch in LLM Agent Interactions
stages-Belief Formation, Intent Generation, and Permission Grant-we demonstrate that diverse threats, from prompt injection to tool poisoning, share a common root cause: the desynchronization between dynamic trust states
Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external information sources, exploiting fundamental trade-offs between
Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks
workflows. However, this autonomy creates a largely overlooked security gap. Existing defenses focus on prompt-injection attacks and fail to address threats embedded in tool metadata, leaving MCP-based systems
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs