Search: prompt injection | AI Threat Intelligence

Severity:

284 results in 143ms

Paper 2511.03247v1

2025-11-05

Death by a Thousand Prompts: Open Model Vulnerability Analysis

adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi

high relevance attack

Paper 2510.19169v2

2025-10-22

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

safety violations such as harmful or explicit text generation, (2) model-manipulation attacks including prompt injection, jailbreaks, and code-interpreter abuse, and (3) data leakage involving sensitive or private information

medium relevance tool

Paper 2510.16381v1

2025-10-18

ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents

models, while exhibiting perfect determinism, enhanced stability against input perturbations, and inherent immunity to prompt injection attacks. By generating decisions grounded in symbolic reasoning, ATA offers a practical and controllable

medium relevance benchmark

Paper 2510.13351v1

2025-10-15

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context

medium relevance tool

Paper 2510.08917v1

2025-10-10

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could

medium relevance attack

Paper 2510.01586v1

2025-10-02

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

role coordination, but their openness and interaction complexity also expose them to jailbreak, prompt-injection, and adversarial collaboration. Existing defenses fall into two lines: (i) self-verification that asks each

medium relevance attack

Paper 2509.26584v1

2025-09-30

Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models

concerns regarding security and fairness. Beyond known attack vectors such as data poisoning and prompt injection, LLMs are also vulnerable to fairness bugs. These refer to unintended behaviors influenced

medium relevance benchmark

Paper 2509.25705v1

2025-09-30

How Diffusion Models Memorize

under memorization due to classifier-free guidance amplifying predictions and inducing overestimation; (ii) memorized prompts inject training images into noise predictions, forcing latent trajectories to converge and steering denoising toward

low relevance other

Paper 2509.23519v2

2025-09-27

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting

medium relevance defense

Paper 2603.17239v1

2026-03-18

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated

high relevance attack

Paper 2603.12644v1

2026-03-13

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

OpenClaw ecosystem. We systematically investigate its current threat landscape, highlighting critical vulnerabilities such as prompt injection-driven Remote Code Execution (RCE), sequential tool attack chains, context amnesia, and supply chain

medium relevance defense

Paper 2512.17146v1

2025-12-19

Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors

GFMs. SAGE functions through an interpretable and automated risk auditing loop. It injects soft prompt perturbations, monitors model behavior across training checkpoints, computes risk metrics such as AUROC and AUPR

high relevance attack

Paper 2512.17259v1

2025-12-19

Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems

detection under stealthy strategies, and (iii) resilience of verifiability mechanisms to adversarial prompt and persona injection. Our approach shifts the evaluation focus from how likely misalignment is to how quickly

medium relevance tool

Paper 2603.08387v1

2026-03-09

AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

propose AULLM++, a reasoning-oriented framework leveraging Large Language Models (LLMs), which injects visual features into textual prompts as actionable semantic premises to guide inference. It formulates AU prediction into

low relevance benchmark

Paper 2602.05401v1

2026-02-05

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free

high relevance attack

Paper 2601.02670v1

2026-01-06

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations

high relevance attack

Paper 2511.10913v1

2025-11-14

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems

medium relevance benchmark

Paper 2511.17666v1

2025-11-21

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

prompted to circumvent their own safety protocols, and 'cross-bypass', where one model generated adversarial prompts to exploit vulnerabilities in the other. Four attack methods were employed - direct injection, role

medium relevance attack

Paper 2601.04443v2

2026-01-07

Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays

perfect fault detection accuracy. Additional evaluations demonstrate robustness to prompt formulation variations, resilience under combined time-synchronization and false-data injection attacks, and stable performance under realistic measurement noise levels

high relevance attack

Paper 2510.06823v2

2025-10-08

Exposing Citation Vulnerabilities in Generative Engines

perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate

medium relevance benchmark

Previous Page 13 of 15 Next