AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 541–560 of 2,027 papers

Clear filters

Attack MEDIUM

AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models

Fengpeng Li, Kemou Li, Qizhou Wang +2 more

Concept erasure helps stop diffusion models (DMs) from generating harmful content; but current methods face robustness retention trade off....

1 months ago cs.LG cs.AI cs.CR PDF

Survey MEDIUM

"Tab, Tab, Bug'': Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs

Yunlong Lyu, Yixuan Tang, Peng Chen +4 more

Modern AI-integrated IDEs are shifting from passive code completion to proactive Next Edit Suggestions (NES). Unlike traditional autocompletion, NES...

1 months ago cs.CR cs.HC PDF

Attack HIGH

Taipan: A Query-free Transfer-based Multiple Sensitive Attribute Inference Attack Solely from Publicly Released Graphs

Ying Song, Balaji Palanisamy

Graph-structured data underpin a wide spectrum of modern applications. However, complex graph topologies and homophilic patterns can facilitate...

1 months ago cs.CR cs.LG PDF

Benchmark HIGH

Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models

Li Lu, Yanjie Zhao, Hongzhou Rao +2 more

Large Language Models (LLMs) have demonstrated remarkable proficiency in vulnerability detection. However, a critical reliability gap persists:...

1 months ago cs.CR PDF

Attack HIGH

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking

Mengyao Du, Han Fang, Haokai Ma +4 more

Suffix-based jailbreak attacks append an adversarial suffix, i.e., a short token sequence, to steer aligned LLMs into unsafe outputs. Since suffixes...

1 months ago cs.CR PDF

Benchmark MEDIUM

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu +3 more

Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear...

1 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Yi Liu, Zhihao Chen, Yanjun Zhang +5 more

Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user...

1 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

Hema Karnam Surendrababu, Nithin Nagaraj

Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and...

1 months ago cs.CR PDF

Attack HIGH

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

Haipeng Li, Rongxuan Peng, Anwei Luo +3 more

The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing...

1 months ago cs.CV cs.CR PDF

Attack HIGH

Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses

Minkyoo Song, Jaehan Kim, Myungchul Kang +3 more

Graph-based retrieval-augmented generation (Graph RAG) is increasingly deployed to support LLM applications by augmenting user queries with...

1 months ago cs.CR PDF

Attack HIGH

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

Sung-Hoon Yoon, Ruizhi Qian, Minda Zhao +2 more

Large Language Models (LLMs) have become integral to many domains, making their safety a critical priority. Prior jailbreaking research has explored...

1 months ago cs.CL cs.AI cs.CR PDF

Tool MEDIUM

VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

Guowei Guan, Yurong Hao, Jiaming Zhang +6 more

Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal...

1 months ago cs.CR PDF

Attack LOW

Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

Mona Rajhans, Vishal Khawarey

Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention....

1 months ago cs.CR cs.AI cs.LG PDF

Defense LOW

One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

Daniel Fein, Max Lamparth, Violet Xiang +2 more

Reward Models (RMs) are crucial for online alignment of language models (LMs) with human preferences. However, RM-based preference-tuning is...

1 months ago cs.CL cs.AI PDF

Benchmark HIGH

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

Junhyeok Lee, Han Jang, Kyu Sung Choi

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...

1 months ago cs.CL cs.LG PDF

Benchmark MEDIUM

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions

Navita Goyal, Hal Daumé

Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to finetuning for...

1 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

Private and interpretable clinical prediction with quantum-inspired tensor train models

José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko +1 more

Machine learning in clinical settings must balance predictive accuracy, interpretability, and privacy. Models such as logistic regression (LR) offer...

1 months ago cs.LG cs.CR quant-ph PDF

Attack HIGH

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Xin Chen, Jie Zhang, Florian Tramèr

Prompt injection is one of the most critical vulnerabilities in LLM agents; yet, effective automated attacks remain largely unexplored from an...

1 months ago cs.LG cs.AI PDF

Benchmark LOW

CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Rui Jia, Ruiyi Lan, Fengrui Liu +7 more

Large language models (LLMs) have advanced the development of personalized learning in education. However, their inherent generation mechanisms often...

1 months ago cs.CL PDF

Attack MEDIUM

Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Tao Huang, Rui Wang, Xiaofei Liu +3 more

%Large vision-language models (LVLMs) have shown substantial advances in multimodal understanding and generation. However, when presented with...

1 months ago cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial