AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 241–260 of 522 papers

Clear filters

Attack HIGH

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Devang Kulshreshtha, Hang Su, Chinmay Hegde +1 more

Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query...

2 months ago cs.CL PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

2 months ago cs.CL PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

2 months ago cs.CL PDF

Attack HIGH

Emoji-Based Jailbreaking of Large Language Models

M P V S Gopinadh, S Mahaboob Hussain

Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy

Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical...

2 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

Haoran Gu, Handing Wang, Yi Mei +2 more

The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While...

2 months ago cs.CR cs.CL PDF

Attack HIGH

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt, Adrian Wood, Idan Habler +1 more

Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate...

2 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

GCG Attack On A Diffusion LLM

Ruben Neyroud, Sam Corley

While most LLMs are autoregressive, diffusion-based LLMs have recently emerged as an alternative method for generation. Greedy Coordinate Gradient...

2 months ago cs.LG cs.CL cs.CR PDF

Attack HIGH

Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?

Yuan Xin, Dingfan Chen, Linyi Yang +2 more

As large language models (LLMs) are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model...

2 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack

Roee Ziv, Raz Lapid, Moshe Sipper

Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security...

2 months ago cs.SD cs.AI cs.CR PDF

Attack HIGH

RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking

Jiawei Liu, Zhuo Chen, Rui Zhu +4 more

Neural ranking models have achieved remarkable progress and are now widely deployed in real-world applications such as Retrieval-Augmented Generation...

2 months ago cs.CR cs.IR PDF

Attack HIGH

EquaCode: A Multi-Strategy Jailbreak Approach for Large Language Models via Equation Solving and Code Completion

Zhen Liang, Hai Huang, Zhengkui Chen

Large language models (LLMs), such as ChatGPT, have achieved remarkable success across a wide range of fields. However, their trustworthiness remains...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks

Soham Padia, Dhananjay Vaidya, Ramchandra Mangrulkar

Securing blockchain-enabled IoT networks against sophisticated adversarial attacks remains a critical challenge. This paper presents a trust-based...

2 months ago cs.CR cs.LG cs.MA PDF

Attack HIGH

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao +5 more

Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital...

2 months ago cs.CV cs.CR PDF

Attack HIGH

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Mengqi He, Xinyu Tian, Xin Shen +4 more

Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty,...

3 months ago cs.CV cs.LG PDF

Attack HIGH

Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study

Duo Chai, Zizhen Liu, Shuhuai Wang +4 more

Large language models (LLMs) are highly compute- and memory-intensive, posing significant demands on high-performance GPUs. At the same time,...

3 months ago cs.AR cs.AI cs.CR PDF

Attack HIGH

LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors

Tianwei Lan, Farid Naït-Abdesselam

The rapid growth in both the scale and complexity of Android malware has driven the widespread adoption of machine learning (ML) techniques for...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks

Xinjie Xu, Shuyu Cheng, Dongwei Xu +2 more

In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami +2 more

Mixture-of-Experts (MoE) architectures have advanced the scaling of Large Language Models (LLMs) by activating only a sparse subset of parameters per...

3 months ago cs.CR PDF

Attack HIGH

AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs

Yihan Wang, Huanqi Yang, Shantanu Pal +1 more

The integration of Large Language Models (LLMs) into wearable sensing is creating a new class of mobile applications capable of nuanced human...

3 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial