AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 79 papers

Clear filters

Benchmark HIGH

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

Junhyeok Lee, Han Jang, Kyu Sung Choi

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...

1 months ago cs.CL cs.LG PDF

Benchmark HIGH

AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System

Hao Li, Ruoyao Wen, Shanghao Shi +2 more

AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external...

1 months ago cs.CR PDF

Benchmark HIGH

Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering

Yunpeng Xiong, Ting Zhang

Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities, but they often produce a high volume of...

1 months ago cs.SE PDF

Benchmark HIGH

AEGIS: White-Box Attack Path Generation using LLMs and Training Effectiveness Evaluation for Large-Scale Cyber Defence Exercises

Ivan K. Tung, Yu Xiang Shi, Alex Chien +2 more

Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit...

1 months ago cs.CR cs.AI PDF

Benchmark HIGH

RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

Miao Lin, Feng Yu, Rui Ning +6 more

Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive...

1 months ago cs.CR cs.CV cs.LG PDF

Benchmark HIGH

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models

Thomas Heverin

Prompt injection evaluations typically treat refusal as a stable, binary indicator of safety. This study challenges that paradigm by modeling refusal...

1 months ago cs.CR PDF

Benchmark HIGH

Multi-Agent End-to-End Vulnerability Management for Mitigating Recurring Vulnerabilities

Zelong Zheng, Jiayuan Zhou, Xing Hu +2 more

Software vulnerability management has become increasingly critical as modern systems scale in size and complexity. However, existing automated...

1 months ago cs.SE PDF

Benchmark HIGH

Vulnerability of LLMs' Stated Beliefs? LLMs Belief Resistance Check Through Strategic Persuasive Conversation Interventions

Fan Huang, Haewoon Kwak, Jisun An

Large Language Models (LLMs) are increasingly employed in various question-answering tasks. However, recent studies showcase that LLMs are...

2 months ago cs.CL cs.AI PDF

Benchmark HIGH

AgenticRed: Optimizing Agentic Systems for Automated Red-teaming

Jiayi Yuan, Jonathan Nöther, Natasha Jaques +1 more

While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on...

2 months ago cs.AI cs.NE PDF

Benchmark HIGH

OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference

Yow-Fu Liou, Yu-Chien Tang, Yu-Hsiang Liu +1 more

Benchmarking large language models (LLMs) is critical for understanding their capabilities, limitations, and robustness. In addition to interface...

2 months ago cs.CL PDF

Benchmark HIGH

Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking

Chutian Huang, Dake Cao, Jiacheng Ji +3 more

Background: While Large Language Models (LLMs) have achieved widespread adoption, malicious prompt engineering specifically "jailbreak attacks" poses...

2 months ago cs.CY PDF

Benchmark HIGH

Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG

Haoze Guo, Ziqi Wei

Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...

2 months ago cs.CR cs.HC PDF

Benchmark HIGH

LLMs in Code Vulnerability Analysis: A Proof of Concept

Shaznin Sultana, Sadia Afreen, Nasir U. Eisty

Context: Traditional software security analysis methods struggle to keep pace with the scale and complexity of modern codebases, requiring...

2 months ago cs.SE PDF

Benchmark HIGH

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang, Chris Ngo, Truong-Son Hy

As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount....

2 months ago cs.CL PDF

Benchmark HIGH

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Zejian Chen, Chaozhuo Li, Chao Li +3 more

This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs),...

2 months ago cs.CR PDF

Benchmark HIGH

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more

As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...

2 months ago cs.CL PDF

Benchmark HIGH

How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference

Songyang Liu, Chaozhuo Li, Rui Pu +5 more

Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...

2 months ago cs.CR cs.CL PDF

Benchmark HIGH

An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems

Md Hasan Saju, Maher Muhtadi, Akramul Azim

The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...

2 months ago cs.SE cs.AI PDF

Benchmark HIGH

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

Jingyu Zhang

Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...

2 months ago cs.CR cs.HC PDF

Benchmark HIGH

Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark

Manu, Yi Guo, Kanchana Thilakarathna +5 more

Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This...

2 months ago cs.CR cs.AI cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial