AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 481–500 of 522 papers

Clear filters

Attack HIGH

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits

Sanket Badhe

We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses...

5 months ago cs.MA cs.AI cs.CR PDF

Attack HIGH

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more

Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...

5 months ago cs.CR cs.AI PDF

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

5 months ago cs.CR PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

5 months ago cs.CR PDF

Attack HIGH

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Chinthana Wimalasuriya, Spyros Tragoudas

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...

5 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang +4 more

As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...

5 months ago cs.AI cs.LG PDF

Attack HIGH

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar +3 more

Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction...

5 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng +6 more

Existing gradient-based jailbreak attacks typically optimize an adversarial suffix to induce a fixed affirmative response, e.g., ``Sure, here...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Milad Nasr, Yanick Fratantonio, Luca Invernizzi +7 more

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level...

5 months ago cs.CR cs.LG PDF

Attack HIGH

Machine Learning for Detection and Analysis of Novel LLM Jailbreaks

John Hawkins, Aditya Pramar, Rodney Beard +1 more

Large Language Models (LLMs) suffer from a range of vulnerabilities that allow malicious users to solicit undesirable responses through manipulation...

5 months ago cs.CL cs.AI cs.CY PDF

Attack HIGH

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Isha Gupta, Rylan Schaeffer, Joshua Kazdan +2 more

The field of adversarial robustness has long established that adversarial examples can successfully transfer between image classifiers and that text...

5 months ago cs.LG cs.AI PDF

Attack HIGH

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li

With the rapid advancement of large language models (LLMs), ensuring their safe use becomes increasingly critical. Fine-tuning is a widely used...

5 months ago cs.CR PDF

Attack HIGH

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesús Villalba +2 more

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to...

5 months ago cs.CL cs.CR cs.SD PDF

Attack HIGH

Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties

Raik Dankworth, Gesina Schwalbe

Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce...

5 months ago cs.CR cs.LG PDF

Attack HIGH

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

Chenxiang Luo, David K. Y. Yau, Qun Song

Federated learning (FL) enables collaborative model training without sharing raw data but is vulnerable to gradient inversion attacks (GIAs), where...

5 months ago cs.CR cs.LG PDF

Attack HIGH

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

Qinjian Zhao, Jiaqi Wang, Zhiqiang Gao +3 more

Large Language Models (LLMs) have achieved impressive performance across diverse natural language processing tasks, but their growing power also...

5 months ago cs.AI PDF

Attack HIGH

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

Xiaobao Wang, Ruoxiao Sun, Yujun Zhang +4 more

Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph...

5 months ago cs.LG cs.CR PDF

Attack HIGH

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang

Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes....

5 months ago cs.AI PDF

Attack HIGH

Fingerprinting LLMs via Prompt Injection

Yuepeng Hu, Zhengyuan Jiang, Mengyuan Li +4 more

Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization, which makes it...

5 months ago cs.CR cs.CL PDF

Attack HIGH

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Yupei Liu, Yanting Wang, Yuqi Jia +2 more

Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses...

5 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial