Search: prompt injection | AI Threat Intelligence

Severity:

288 results in 118ms

Paper 2511.10913v1

2025-11-14

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems

medium relevance benchmark

Paper 2511.17666v1

2025-11-21

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

prompted to circumvent their own safety protocols, and 'cross-bypass', where one model generated adversarial prompts to exploit vulnerabilities in the other. Four attack methods were employed - direct injection, role

medium relevance attack

Paper 2601.04443v2

2026-01-07

Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays

perfect fault detection accuracy. Additional evaluations demonstrate robustness to prompt formulation variations, resilience under combined time-synchronization and false-data injection attacks, and stable performance under realistic measurement noise levels

high relevance attack

Paper 2510.06823v2

2025-10-08

Exposing Citation Vulnerabilities in Generative Engines

perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate

medium relevance benchmark

Paper 2602.15654v2

2026-02-17

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents

high relevance attack

Paper 2601.13359v1

2026-01-19

Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

assistant message block rather than the user prompt, increasing ASR by 64% over GCG on Llama-3.1-8B in a prompt-agnostic setting. The results establish sockpuppetting

high relevance attack

Paper 2602.16958v1

2026-02-18

Automating Agent Hijacking via Structural Template Injection

ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success

high relevance attack

Paper 2510.11151v1

2025-10-13

TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code

enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured

medium relevance tool

Paper 2601.10294v2

2026-01-15

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments

high relevance attack

Paper 2511.00664v1

2025-11-01

ShadowLogic: Backdoors in Any Whitebox LLM

injecting an uncensoring vector into its computational graph representation. We set a trigger phrase that, when added to the beginning of a prompt into the LLM, applies the uncensoring vector

medium relevance attack

Paper 2603.16734v1

2026-03-17

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

benign counterparts) under controlled prompt conditions that vary user-context personalization (no bio, bio-only, bio+mental health disclosure) and include a lightweight jailbreak injection. Our results reveal that harmful

medium relevance benchmark

Paper 2510.04503v2

2025-10-06

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

algorithm. P2P injects benign triggers with safe alternative labels into a subset of training samples and fine-tunes the model on this re-poisoned dataset by leveraging prompt-based learning

medium relevance defense

Paper 2510.17098v2

2025-10-20

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection

medium relevance attack

Paper 2602.01574v1

2026-02-02

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

reference pool by sampling a frozen text-to-image model conditioned on the target prompt, and then carefully select the Top-K most semantically relevant anchors under the surrogate

high relevance attack

Paper 2511.08905v3

2025-11-12

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective

medium relevance attack

Paper 2510.11851v2

2025-10-13

Deep Research Brings Deeper Harm

agents. To address this gap, we propose two novel jailbreak strategies: Plan Injection, which injects malicious sub-goals into the agent's plan; and Intent Hijack, which reframes harmful queries

medium relevance benchmark

Paper 2603.18740v1

2026-03-19

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly

high relevance survey

Paper 2509.20324v1

2025-09-24

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

demonstrated that LLMs can leak sensitive information through training data memorization or adversarial prompts, and RAG systems inherit many of these vulnerabilities. At the same time, reliance

high relevance attack

Paper 2511.09222v4

2025-11-12

Toward Honest Language Models for Deductive Reasoning

cases by randomly perturbing an edge in half of the instances. We find that prompting and existing training methods, including GRPO with or without supervised fine-tuning initialization, struggle

low relevance benchmark

Paper 2602.19450v1

2026-02-23

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors

high relevance survey

Previous Page 14 of 15 Next