AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 421–440 of 697 papers

Clear filters

Attack HIGH

From static to adaptive: immune memory-based jailbreak detection for large language models

Jun Leng, Yu Liu, Litian Zhang +3 more

Large Language Models (LLMs) serve as the backbone of modern AI systems, yet they remain susceptible to adversarial jailbreak attacks. Consequently,...

3 months ago cs.CR PDF

Benchmark HIGH

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks

Songwen Zhao, Danqing Wang, Kexun Zhang +3 more

Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with...

3 months ago cs.SE cs.CL PDF

Attack HIGH

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li +3 more

While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing...

3 months ago cs.CV cs.CL cs.CR PDF

Attack HIGH

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

Afshin Khadangi, Hanna Marxen, Amir Sartipi +2 more

Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and...

3 months ago cs.CY cs.AI PDF

Attack HIGH

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Ziyi Tong, Feifei Sun, Le Minh Nguyen

Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently,...

3 months ago cs.CR cs.AI PDF

Attack HIGH

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou +5 more

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in reasoning, planning, and tool usage. The recently proposed Model...

3 months ago cs.CR cs.CL PDF

Attack HIGH

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

Haowei Fu, Bo Ni, Han Xu +3 more

Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs)...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Securing Large Language Models (LLMs) from Prompt Injection Attacks

Omar Farooq Khan Suri, John McCrae

Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection...

3 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

DefenSee: Dissecting Threat from Sight and Text -- A Multi-View Defensive Pipeline for Multi-modal Jailbreaks

Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing

Multi-modal large language models (MLLMs), capable of processing text, images, and audio, have been widely adopted in various AI applications....

3 months ago cs.CR PDF

Attack HIGH

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Mintong Kang, Chong Xiang, Sanjay Kariyappa +3 more

Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Bias Injection Attacks on RAG Databases and Sanitization Defenses

Hao Wu, Prateek Saxena

This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning...

3 months ago cs.CR cs.AI cs.DB PDF

Attack HIGH

Concept-Guided Backdoor Attack on Vision Language Models

Haoyu Shen, Weimin Lyu, Haotian Xu +1 more

Vision-Language Models (VLMs) have achieved impressive progress in multimodal text generation, yet their rapid adoption raises increasing concerns...

3 months ago cs.CR cs.AI PDF

Benchmark HIGH

Red Teaming Large Reasoning Models

Jiawei Chen, Yang Yang, Chao Yu +6 more

Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical...

3 months ago cs.CR cs.AI PDF

Attack HIGH

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

Mohammad M Maheri, Xavier Cadet, Peter Chin +1 more

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical...

3 months ago cs.LG cs.AI cs.CR PDF

Defense HIGH

Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection

Fouad Trad, Ali Chehab

Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in...

3 months ago cs.SE cs.AI cs.CL PDF

Attack HIGH

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Richard J. Young

Large Language Model (LLM) safety guardrail models have emerged as a primary defense mechanism against harmful content generation, yet their...

3 months ago cs.CR PDF

Attack HIGH

Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression

Tianyu Zhang, Zihang Xi, Jingyu Hua +1 more

In the realm of black-box jailbreak attacks on large language models (LLMs), the feasibility of constructing a narrow safety proxy, a lightweight...

3 months ago cs.CR cs.AI PDF

Attack HIGH

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley +3 more

The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application...

4 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

Jakub Hoscilowicz, Artur Janicki

We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted...

4 months ago cs.CL PDF

Attack HIGH

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

Sen Nie, Jie Zhang, Jianxin Yan +2 more

Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating...

4 months ago cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial