AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 381–400 of 522 papers

Clear filters

Attack HIGH

Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack

Peng Ding, Jun Kuang, Wen Sun +5 more

Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities. Investigating these weaknesses is...

4 months ago cs.CL PDF

Attack HIGH

Red-teaming Activation Probes using Prompted LLMs

Phil Blandfort, Robert Graham

Activation probes are attractive monitors for AI systems due to low cost and latency, but their real-world robustness remains underexplored. We ask:...

4 months ago cs.LG cs.AI PDF

Attack HIGH

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

Ruofan Liu, Yun Lin, Zhiyong Huang +1 more

Large language models (LLMs) are increasingly integrated into IT infrastructures, where they process user data according to predefined instructions....

4 months ago cs.CR cs.AI PDF

Attack HIGH

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao, Haiyang Zhao, Yimin Chen +3 more

The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from...

4 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks

Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz +2 more

The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this...

4 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan, Alexander Matt Turner, Mark Kurzeja +2 more

An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy...

4 months ago cs.LG cs.AI PDF

Attack HIGH

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards...

4 months ago cs.LG PDF

Attack HIGH

Secure Retrieval-Augmented Generation against Poisoning Attacks

Zirui Cheng, Jikai Sun, Anjun Gao +4 more

Large language models (LLMs) have transformed natural language processing (NLP), enabling applications from content generation to decision support....

4 months ago cs.CR cs.IR cs.LG PDF

Attack HIGH

Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases

Ziyao Cui, Minxing Zhang, Jian Pei

Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed,...

4 months ago cs.CR cs.LG PDF

Attack HIGH

AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts

Yufan Liu, Wanqian Zhang, Huashan Chen +4 more

Despite rapid advancements in text-to-image (T2I) models, their safety mechanisms are vulnerable to adversarial prompts, which maliciously generate...

4 months ago cs.CV PDF

Attack HIGH

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo +7 more

Modern coding agents integrated into IDEs orchestrate powerful tools and high-privilege system access, creating a high-stakes attack surface. Prior...

4 months ago cs.CR cs.AI PDF

Attack HIGH

CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents

Zesen Liu, Zhixiang Zhang, Yuchong Xie +1 more

LLM-powered agents often use prompt compression to reduce inference costs, but this introduces a new security risk. Compression modules, which are...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers

Dongyi Liu, Jiangtong Li, Dawei Cheng +1 more

Graph Neural Networks(GNNs) are vulnerable to backdoor attacks, where adversaries implant malicious triggers to manipulate model predictions....

5 months ago cs.CR cs.LG PDF

Attack HIGH

SecureLearn -- An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

Anum Paracha, Junaid Arshad, Mohamed Ben Farah +1 more

Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance....

5 months ago cs.CR cs.LG PDF

Attack HIGH

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Pavlos Ntais

Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks that exploit contextual framing to bypass safety...

5 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Havva Alizadeh Noughabi, Julien Serbanescu, Fattane Zarrinkalam +1 more

Despite recent advances, Large Language Models remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs....

5 months ago cs.CL cs.AI PDF

Attack HIGH

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

Kieu Dang, Phung Lai, NhatHai Phan +3 more

Large language models (LLMs) demonstrate remarkable capabilities across various tasks. However, their deployment introduces significant risks related...

5 months ago cs.CR PDF

Attack HIGH

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa +8 more

Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel...

5 months ago cs.LG PDF

Attack HIGH

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Xingwei Zhong, Kar Wai Fok, Vrizlynn L. L. Thing

Multimodal large language models (MLLMs) comprise of both visual and textual modalities to process vision language tasks. However, MLLMs are...

5 months ago cs.CR PDF

Attack HIGH

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning

Mingrui Liu, Sixiao Zhang, Cheng Long +1 more

As Large Language Models (LLMs) become integral to computing infrastructure, safety alignment serves as the primary security control preventing the...

5 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial