Paper 2601.07263v1

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first

high relevance attack
Paper 2601.07185v1

Defenses Against Prompt Attacks Learn Surface Heuristics

test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent

high relevance attack
Paper 2601.07853v1

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental

medium relevance benchmark
Paper 2601.05059v1

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing

medium relevance benchmark
Paper 2601.04583v1

Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries

threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion

medium relevance survey
Paper 2601.01972v3

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. Finally, our interpretability study reveals patterns in Mamba's hidden

high relevance attack
Paper 2601.01972v4

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend

high relevance attack
Paper 2601.01241v1

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners

medium relevance benchmark
Paper 2512.24415v1

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped

high relevance benchmark
Paper 2601.00867v1

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained

medium relevance survey
Paper 2512.23132v1

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models

medium relevance tool
Paper 2512.23032v1

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric confuses unfaithfulness with incompleteness

low relevance benchmark
Paper 2512.21999v1

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

analyzing differential hidden states of response pairs. Then, these clusters are fine-tuned using prompts injected with adversarial tuned prefixes that are optimized to maximize visual neglect, thereby forcing

low relevance attack
Paper 2601.08843v1

Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness

remaining subset. Additionally, robustness experiments reveal that while the model is resilient to prompt injection, it is sensitive to synonym substitutions. Our work provides critical insights into the capabilities

medium relevance defense
Paper 2512.12921v1

Cisco Integrated AI Security and Safety Framework Report

outputs), model and data integrity compromise (e.g., poisoning, supply-chain tampering), runtime manipulations (e.g., prompt injection, tool and agent misuse), and ecosystem risks (e.g., orchestration abuse, multi-agent collusion). Existing

medium relevance tool
Paper 2512.08737v1

Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy

despite the empirical reality that LLM agents remain unreliable, hallucinated, manipulable, and vulnerable to prompt-injection and tool-abuse. A natural response is "agents-at-stake": binding economically meaningful, slashable

medium relevance attack
Paper 2512.06914v2

SoK: Trust-Authorization Mismatch in LLM Agent Interactions

stages-Belief Formation, Intent Generation, and Permission Grant-we demonstrate that diverse threats, from prompt injection to tool poisoning, share a common root cause: the desynchronization between dynamic trust states

medium relevance survey
Paper 2512.06716v2

Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents

Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external information sources, exploiting fundamental trade-offs between

medium relevance tool
Paper 2512.06556v1

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

workflows. However, this autonomy creates a largely overlooked security gap. Existing defenses focus on prompt-injection attacks and fail to address threats embedded in tool metadata, leaving MCP-based systems

high relevance tool
Paper 2512.04895v1

Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems

Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs

high relevance tool
Previous Page 12 of 15 Next