AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 741–760 of 798 papers

Clear filters

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

5 months ago cs.CR PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

5 months ago cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

5 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

A Statistical Method for Attack-Agnostic Adversarial Attack Detection with Compressive Sensing Comparison

Chinthana Wimalasuriya, Spyros Tragoudas

Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...

5 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Zhaorun Chen, Xun Liu, Mintong Kang +4 more

As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...

5 months ago cs.AI cs.LG PDF

Attack HIGH

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar +3 more

Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction...

5 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng +6 more

Existing gradient-based jailbreak attacks typically optimize an adversarial suffix to induce a fixed affirmative response, e.g., ``Sure, here...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli, Simone Sestito, Iacopo Masi

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...

5 months ago cs.CL PDF

Attack HIGH

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Milad Nasr, Yanick Fratantonio, Luca Invernizzi +7 more

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level...

5 months ago cs.CR cs.LG PDF

Attack HIGH

Machine Learning for Detection and Analysis of Novel LLM Jailbreaks

John Hawkins, Aditya Pramar, Rodney Beard +1 more

Large Language Models (LLMs) suffer from a range of vulnerabilities that allow malicious users to solicit undesirable responses through manipulation...

5 months ago cs.CL cs.AI cs.CY PDF

Attack MEDIUM

Securing generative artificial intelligence with parallel magnetic tunnel junction true randomness

Youwei Bao, Shuhan Yang, Hyunsoo Yang

Deterministic pseudo random number generators (PRNGs) used in generative artificial intelligence (GAI) models produce predictable patterns vulnerable...

5 months ago cs.LG cond-mat.mtrl-sci physics.data-an PDF

Attack MEDIUM

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

Zhenyu Pan, Yiting Zhang, Zhuo Liu +13 more

LLM-based multi-agent systems excel at planning, tool use, and role coordination, but their openness and interaction complexity also expose them to...

5 months ago cs.AI PDF

Attack MEDIUM

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Jaiden Fairoze, Sanjam Garg, Keewoo Lee +1 more

As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms...

5 months ago cs.LG cs.CR PDF

Attack HIGH

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Isha Gupta, Rylan Schaeffer, Joshua Kazdan +2 more

The field of adversarial robustness has long established that adversarial examples can successfully transfer between image classifiers and that text...

5 months ago cs.LG cs.AI PDF

Attack HIGH

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li

With the rapid advancement of large language models (LLMs), ensuring their safe use becomes increasingly critical. Fine-tuning is a widely used...

5 months ago cs.CR PDF

Attack HIGH

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesús Villalba +2 more

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to...

5 months ago cs.CL cs.CR cs.SD PDF

Attack MEDIUM

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned...

5 months ago cs.LG cs.CL cs.CR PDF

Attack HIGH

Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties

Raik Dankworth, Gesina Schwalbe

Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce...

5 months ago cs.CR cs.LG PDF

Attack MEDIUM

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda +1 more

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive...

5 months ago cs.LG cs.CR PDF

Attack MEDIUM

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao +4 more

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware...

5 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial