AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 23 papers

Clear filters

Benchmark MEDIUM

Do World Action Models Generalize Better than VLAs? A Robustness Study

Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati +10 more

Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting...

2 days ago cs.RO PDF

Benchmark MEDIUM

SecureBreak -- A dataset towards safe and secure models

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera

Large language models are becoming pervasive core components in many real-world applications. As a consequence, security alignment represents a...

2 days ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually...

2 days ago cs.CR cs.AI cs.MM PDF

Survey MEDIUM

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu, Feiyang Li +7 more

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by...

2 days ago cs.CR cs.AI PDF

Attack MEDIUM

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

Huamin Chen, Xunzhuo Liu, Bowei He +5 more

Over the past year, the vLLM Semantic Router project has released a series of work spanning: (1) core routing mechanisms -- signal-driven routing,...

2 days ago cs.LG cs.DC PDF

Attack MEDIUM

Reward Sharpness-Aware Fine-Tuning for Diffusion Models

Kwanyoung Kim, Byeongsu Sim

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models with human preferences, inspiring the...

3 days ago cs.LG cs.AI PDF

Attack MEDIUM

Detection of adversarial intent in Human-AI teams using LLMs

Abed K. Musaffar, Ambuj Singh, Francesco Bullo

Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval,...

3 days ago cs.LG cs.AI cs.HC PDF

Defense MEDIUM

Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg +1 more

Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on...

3 days ago cs.CL cs.AI cs.CY PDF

Tool MEDIUM

Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

Uchi Uchibeke

AI agents today have passwords but no permission slips. They execute tool calls (fund transfers, database queries, shell commands, sub-agent...

3 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu +4 more

Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly...

4 days ago cs.CR PDF

Benchmark MEDIUM

LJ-Bench: Ontology-Based Benchmark for U.S. Crime

Hung Yun Tseng, Wuzhen Li, Blerina Gkotse +1 more

The potential of Large Language Models (LLMs) to provide harmful information remains a significant concern due to the vast breadth of illegal queries...

4 days ago cs.LG PDF

Benchmark MEDIUM

The production of meaning in the processing of natural language

Christopher J. Agostino, Quan Le Thien, Nayan D'Souza +1 more

Understanding the fundamental mechanisms governing the production of meaning in the processing of natural language is critical for designing safe,...

4 days ago cs.CL cs.AI cs.HC PDF

Attack MEDIUM

Memory poisoning and secure multi-agent systems

Vicenç Torra, Maria Bras-Amorós

Memory poisoning attacks for Agentic AI and multi-agent systems (MAS) have recently caught attention. It is partially due to the fact that Large...

5 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan +6 more

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to...

5 days ago cs.CR cs.AI PDF

Attack MEDIUM

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu +1 more

Many learning systems now use graph data in which each node also contains text, such as papers with abstracts or users with posts. Because these...

5 days ago cs.LG cs.CR PDF

Attack MEDIUM

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang +2 more

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with...

5 days ago cs.LG cs.IT physics.comp-ph PDF

Tool MEDIUM

A Framework for Formalizing LLM Agent Security

Vincent Siu, Jingxuan He, Kyle Montgomery +4 more

Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security...

5 days ago cs.CR cs.AI PDF

Defense MEDIUM

The Autonomy Tax: Defense Training Breaks LLM Agents

Shawn Li, Yue Zhao

Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete...

5 days ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

Carlos Hinojosa, Clemens Grange, Bernard Ghanem

Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However,...

6 days ago cs.CV cs.AI cs.CL PDF

Benchmark MEDIUM

Functional Subspace Watermarking for Large Language Models

Zikang Ding, Junhao Li, Suling Wu +3 more

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably...

6 days ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial