Paper 2601.12560v1

Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents

practices. Finally, we highlight open challenges, such as hallucination in action, infinite loops, and prompt injection, and outline future research directions toward more robust and reliable autonomous systems

medium relevance benchmark
Paper 2601.12449v1

AgenTRIM: Tool Risk Mitigation for Agentic AI

While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents

medium relevance tool
Paper 2601.10338v1

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation

medium relevance survey
Paper 2601.10156v1

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

percent on average and improves benign task completion by approximately 10 percent under prompt injection attacks

medium relevance tool
Paper 2601.09923v2

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly

medium relevance tool
Paper 2601.07263v1

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first

high relevance attack
Paper 2601.07185v1

Defenses Against Prompt Attacks Learn Surface Heuristics

test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent

high relevance attack
Paper 2601.07853v1

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental

medium relevance benchmark
Paper 2601.05059v1

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing

medium relevance benchmark
Paper 2601.04583v1

Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries

threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion

medium relevance survey
Paper 2601.01972v3

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. Finally, our interpretability study reveals patterns in Mamba's hidden

high relevance attack
Paper 2601.01972v4

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend

high relevance attack
Paper 2601.01241v1

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners

medium relevance benchmark
Paper 2512.24415v1

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped

high relevance benchmark
Paper 2601.00867v1

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained

medium relevance survey
Paper 2512.23132v1

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models

medium relevance tool
Paper 2512.23032v1

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric confuses unfaithfulness with incompleteness

low relevance benchmark
Paper 2512.21999v1

Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs

analyzing differential hidden states of response pairs. Then, these clusters are fine-tuned using prompts injected with adversarial tuned prefixes that are optimized to maximize visual neglect, thereby forcing

low relevance attack
Paper 2601.08843v1

Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness

remaining subset. Additionally, robustness experiments reveal that while the model is resilient to prompt injection, it is sensitive to synonym substitutions. Our work provides critical insights into the capabilities

medium relevance defense
Paper 2512.12921v1

Cisco Integrated AI Security and Safety Framework Report

outputs), model and data integrity compromise (e.g., poisoning, supply-chain tampering), runtime manipulations (e.g., prompt injection, tool and agent misuse), and ecosystem risks (e.g., orchestration abuse, multi-agent collusion). Existing

medium relevance tool
Previous Page 11 of 14 Next