AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 259 papers

Clear filters

Attack MEDIUM

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Ali Raza, Gurang Gupta, Nikolay Matyunin +1 more

Warning: This article includes red-teaming experiments, which contain examples of compromised LLM responses that may be offensive or upsetting. Large...

2 weeks ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

AgenticCyOps: Securing Multi-Agentic AI Integration in Enterprise Cyber Operations

Shaswata Mitra, Raj Patel, Sudip Mittal +2 more

Multi-agent systems (MAS) powered by LLMs promise adaptive, reasoning-driven enterprise workflows, yet granting agents autonomous control over tools,...

2 weeks ago cs.CR cs.MA cs.SE PDF

Attack MEDIUM

LLM-Agent Interactions on Markets with Information Asymmetries

Alexander Erlei, Lukas Meub

As AI agents increasingly act on behalf of human stakeholders in economic settings, understanding their behavior in complex market environments...

2 weeks ago econ.GN PDF

Attack MEDIUM

Detecting Cryptographically Relevant Software Packages with Collaborative LLMs

Eduard Hirsch, Kristina Raab, Tobias J. Bauer +1 more

IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities....

2 weeks ago cs.CR cs.IR PDF

Attack MEDIUM

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim +2 more

Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN...

2 weeks ago cs.CR PDF

Attack MEDIUM

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...

2 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent...

3 weeks ago cs.LG cs.AI PDF

Attack MEDIUM

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang +5 more

Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative...

3 weeks ago cs.MA cs.AI PDF

Attack MEDIUM

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

Achyutha Menon, Magnus Saebo, Tyler Crosse +3 more

The accelerating adoption of language models (LMs) as agents for deployment in long-context tasks motivates a thorough understanding of goal drift:...

3 weeks ago cs.AI PDF

Attack MEDIUM

Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI

Edouard Lansiaux

Federated Learning (FL) enables collaborative training of medical AI models across hospitals without centralizing patient data. However, the exchange...

3 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

From Shallow to Deep: Pinning Semantic Intent via Causal GRPO

Shuyi Zhou, Zeen Song, Wenwen Qiang +4 more

Large Language Models remain vulnerable to adversarial prefix attacks (e.g., ``Sure, here is'') despite robust standard safety. We diagnose this...

3 weeks ago cs.LG PDF

Attack MEDIUM

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang +2 more

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on...

3 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Tracking Capabilities for Safer Agents

Martin Odersky, Yaoyu Zhao, Yichen Xu +2 more

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause...

3 weeks ago cs.AI cs.PL PDF

Attack MEDIUM

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Jingyuan Xie, Wenjie Wang, Ji Wu +1 more

Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly...

3 weeks ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

3 weeks ago cs.CV PDF

Attack MEDIUM

SwitchCraft: Training-Free Multi-Event Video Generation with Attention Controls

Qianxun Xu, Chenxi Song, Yujun Cai +1 more

Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...

3 weeks ago cs.CV PDF

Attack MEDIUM

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren +2 more

Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external...

3 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak

Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior by...

3 weeks ago cs.LG cs.AI PDF

Attack MEDIUM

Manifold of Failure: Behavioral Attraction Basins in Language Models

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala +4 more

While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a...

4 weeks ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy +8 more

Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval,...

4 weeks ago cs.CR cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial