AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 401–420 of 2,077 papers

Survey HIGH

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Scott Thornton

AI-assisted code review is widely used to detect vulnerabilities before production release. Prior work shows that adversarial prompt manipulation can...

1 months ago cs.CR cs.AI cs.LG PDF

Attack LOW

Hardware-Agnostic Modeling of Quantum Side-Channel Leakage via Conditional Dynamics and Learning from Full Correlation Data

Brennan Bell, Andreas Trügler, Konstantin Beyer +1 more

We study a sequential coherent side-channel model in which an adversarial probe qubit interacts with a target qubit during a hidden gate sequence....

1 months ago quant-ph cs.CR PDF

Attack MEDIUM

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Yuval Felendler, Parth A. Gandhi, Idan Habler +2 more

Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Intent Laundering: AI Safety Datasets Are Not What They Seem

Shahriar Golchin, Marc Wetter

We systematically evaluate the quality of widely used AI safety datasets from two perspectives: in isolation and in practice. In isolation, we...

1 months ago cs.CR cs.AI cs.CL PDF

Attack LOW

A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference

Or Zamir

A natural and informal approach to verifiable (or zero-knowledge) ML inference over floating-point data is: ``prove that each layer was computed...

1 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Haodong Zhao, Jinming Hu, Gongshen Liu

Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt...

1 months ago cs.CR PDF

Attack HIGH

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

Xianglin Yang, Yufei He, Shuo Ji +2 more

Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Varun Pratap Bhardwaj

We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural...

1 months ago cs.AI cs.CR PDF

Benchmark LOW

Semantic-Guided 3D Gaussian Splatting for Transient Object Removal

Aditi Prabakaran, Priyesh Shukla

Transient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on...

1 months ago cs.CV PDF

Benchmark LOW

A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection

Udbhav Prasad, Aniesh Chawla

Cryptographic digests (e.g., MD5, SHA-256) are designed to provide exact identity. Any single-bit change in the input produces a completely different...

1 months ago cs.CR cs.AI PDF

Attack LOW

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Jiajun Xu, Jiageng Mao, Ang Qi +5 more

Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI...

1 months ago cs.LG cs.AI PDF

Attack HIGH

ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

Mitchell Piehl, Zhaohan Xi, Zuobin Xiong +2 more

Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent...

1 months ago cs.LG PDF

Defense LOW

Unforgeable Watermarks for Language Models via Robust Signatures

Huijia Lin, Kameron Shahabi, Min Jae Song

Language models now routinely produce text that is difficult to distinguish from human writing, raising the need for robust tools to verify content...

1 months ago cs.CR cs.AI cs.LG PDF

Defense LOW

Visual Persuasion: What Influences Decisions of Vision-Language Models?

Manuel Cherep, Pranav M R, Pattie Maes +1 more

The web is littered with images, once created for human consumption and now increasingly interpreted by agents using vision-language models (VLMs)....

1 months ago cs.CV cs.AI PDF

Attack MEDIUM

Closing the Distribution Gap in Adversarial Training for LLMs

Chengzhi Hu, Jonas Dornbusch, David Lüdke +2 more

Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant...

1 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit +4 more

LoRA adapters let users fine-tune large language models (LLMs) efficiently. However, LoRA adapters are shared through open repositories like Hugging...

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau +3 more

Frontier LLMs are safeguarded against attempts to extract harmful information via adversarial prompts known as "jailbreaks". Recently, defenders have...

1 months ago cs.LG PDF

Attack MEDIUM

Overthinking Loops in Agents: A Structural Risk via MCP Tools

Yohan Lee, Jisoo Jang, Seoyeon Choi +2 more

Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool...

1 months ago cs.CL cs.CR PDF

Attack HIGH

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Lukas Struppek, Adam Gleave, Kellin Pelrine

As the capabilities of large language models continue to advance, so does their potential for misuse. While closed-source models typically rely on...

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu +1 more

Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns....

1 months ago cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial