AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 181–200 of 701 papers

Clear filters

Benchmark HIGH

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

Junhyeok Lee, Han Jang, Kyu Sung Choi

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...

1 months ago cs.CL cs.LG PDF

Attack HIGH

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Xin Chen, Jie Zhang, Florian Tramèr

Prompt injection is one of the most critical vulnerabilities in LLM agents; yet, effective automated attacks remain largely unexplored from an...

1 months ago cs.LG cs.AI PDF

Attack HIGH

Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

Takashi Koide, Hiroki Nakano, Daiki Chiba

Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered...

1 months ago cs.CR PDF

Attack HIGH

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

Yao Zhou, Zeen Song, Wenwen Qiang +4 more

Safety alignment mechanisms in Large Language Models (LLMs) often operate as latent internal states, obscuring the model's inherent capabilities....

1 months ago cs.CL PDF

Attack HIGH

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

Zihan Wang, Hongwei Li, Rui Zhang +2 more

Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data...

1 months ago cs.CR PDF

Attack HIGH

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

Ziyou Jiang, Lin Shi, Guowei Yang +3 more

Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to...

1 months ago cs.CR PDF

Attack HIGH

Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Yunbei Zhang, Yingqiang Ge, Weijie Xu +3 more

Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally...

1 months ago cs.CR cs.CV cs.LG PDF

Attack HIGH

Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

Ethan Rathbun, Wo Wei Lin, Alina Oprea +1 more

Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making...

1 months ago cs.CR cs.LG cs.RO PDF

Attack HIGH

Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

Jafar Isbarov, Murat Kantarcioglu

As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Attack Selection Reduces Safety in Concentrated AI Control Settings against Trusted Monitoring

Joachim Schaeffer, Arjun Khandelwal, Tyler Tracy

Future AI deployments will likely be monitored for malicious behaviour. The ability of these AIs to subvert monitors by adversarially selecting...

1 months ago cs.CR cs.AI PDF

Attack HIGH

When and Where to Attack? Stage-wise Attention-Guided Adversarial Attack on Large Vision Language Models

Jaehyun Kwak, Nam Cao, Boryeong Cho +3 more

Adversarial attacks against Large Vision-Language Models (LVLMs) are crucial for exposing safety vulnerabilities in modern multimodal systems. Recent...

1 months ago cs.CV PDF

Attack HIGH

How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks

Yanshu Wang, Shuaishuai Yang, Jingjing He +1 more

Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

I Can't Believe It's Not a Valid Exploit

Derin Gezgin, Amartya Das, Shinhae Kim +3 more

Recently Large Language Models (LLMs) have been used in security vulnerability detection tasks including generating proof-of-concept (PoC) exploits....

1 months ago cs.SE PDF

Attack HIGH

Evaluating the Vulnerability Landscape of LLM-Generated Smart Contracts

Hoang Long Do, Nasrin Sohrabi, Muneeb Ul Hassan

Large language models (LLMs) have been widely adopted in modern software development lifecycles, where they are increasingly used to automate and...

1 months ago cs.CR PDF

Attack HIGH

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

Shutong Fan, Lan Zhang, Xiaoyong Yuan

Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern...

1 months ago cs.AI PDF

Attack HIGH

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Xilong Wang, Yinuo Liu, Zhun Wang +2 more

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones....

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models

Chen Xiong, Zhiyuan He, Pin-Yu Chen +2 more

Activation steering is a practical post-training model alignment technique to enhance the utility of Large Language Models (LLMs). Prior to deploying...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang, Yuxin Chen, Gang Xu +3 more

Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable...

1 months ago cs.AI cs.LG PDF

Attack HIGH

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models

Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan

Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms....

1 months ago cs.LG PDF

Benchmark HIGH

AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System

Hao Li, Ruoyao Wen, Shanghao Shi +2 more

AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external...

1 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial