AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 341–360 of 522 papers

Clear filters

Attack HIGH

Tuning for Two Adversaries: Enhancing the Robustness Against Transfer and Query-Based Attacks using Hyperparameter Tuning

Pascal Zimmer, Ghassan Karame

In this paper, we present the first detailed analysis of how training hyperparameters -- such as learning rate, weight decay, momentum, and batch...

4 months ago cs.LG cs.CR cs.CV PDF

Attack HIGH

Whose Narrative is it Anyway? A KV Cache Manipulation Attack

Mukkesh Ganesh, Kaushik Iyer, Arun Baalaaji Sankar Ananthan

The Key Value(KV) cache is an important component for efficient inference in autoregressive Large Language Models (LLMs), but its role as a...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Yunhao Chen, Xin Wang, Juncheng Li +5 more

Automated red teaming frameworks for Large Language Models (LLMs) have become increasingly sophisticated, yet they share a fundamental limitation:...

4 months ago cs.CL cs.CR PDF

Attack HIGH

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

Haotian Jin, Yang Li, Haihui Fan +3 more

Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

Samuel Nathanson, Rebecca Williams, Cynthia Matuszek

Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities...

4 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Jiaji Ma, Puja Trivedi, Danai Koutra

Text-attributed graphs (TAGs), which combine structural and textual node information, are ubiquitous across many domains. Recent work integrates...

4 months ago cs.CR cs.LG PDF

Attack HIGH

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

Hasini Jayathilaka

Prompt injection attacks are an emerging threat to large language models (LLMs), enabling malicious users to manipulate outputs through carefully...

4 months ago cs.CR PDF

Attack HIGH

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Rui Wang, Zeming Wei, Xiyue Zhang +1 more

Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these...

4 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Gil Goren, Shahar Katz, Lior Wolf

Large Language Models (LLMs) are vulnerable to adversarial attacks that bypass safety guidelines and generate harmful content. Mitigating these...

4 months ago cs.LG PDF

Attack HIGH

BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel

Hao Li, Jiajun He, Guangshuo Wang +3 more

Retrieval-Augmented Generation (RAG) enhances large language models by integrating external knowledge, but reliance on proprietary or sensitive...

4 months ago cs.CR PDF

Attack HIGH

NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Lama Sleem, Jerome Francois, Lujun Li +3 more

Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite...

4 months ago cs.CR cs.AI PDF

Attack HIGH

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Runpeng Geng, Yanting Wang, Chenlong Yin +3 more

Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an...

4 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Say It Differently: Linguistic Styles as Jailbreak Vectors

Srikant Panda, Avinash Rai

Large Language Models (LLMs) are commonly evaluated for robustness against paraphrased or semantically equivalent jailbreak prompts, yet little...

4 months ago cs.CL cs.AI PDF

Attack HIGH

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu, Renjue Li, Lijia Yu +3 more

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang, Xuezhen Zhang, Zhifeng Han +6 more

Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are...

4 months ago cs.SD cs.AI PDF

Attack HIGH

MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

Zihan Wang, Guansong Pang, Wenjun Miao +2 more

Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging...

4 months ago cs.CV PDF

Attack HIGH

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

Shigeki Kusaka, Keita Saito, Mikoto Kudo +3 more

Large language models (LLMs) are increasingly deployed in real-world systems, making it critical to understand their vulnerabilities. While data...

4 months ago cs.LG cs.AI PDF

Attack HIGH

StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak

Hongyi Li, Chengxuan Zhou, Chu Wang +5 more

Large Audio-language Models (LAMs) have recently enabled powerful speech-based interactions by coupling audio encoders with Large Language Models...

4 months ago cs.SD PDF

Attack HIGH

A methodological analysis of prompt perturbations and their effect on attack success rates

Tiago Machado, Maysa Malfiza Garcia de Macedo, Rogerio Abreu de Paula +5 more

This work aims to investigate how different Large Language Models (LLMs) alignment methods affect the models' responses to prompt attacks. We...

4 months ago cs.CL PDF

Attack HIGH

Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs

Yuxuan Zhou, Yuzhao Peng, Yang Bai +7 more

Large Vision-Language Models (VLMs) are susceptible to jailbreak attacks: researchers have developed a variety of attack strategies that can...

4 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial