AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 355 papers

Clear filters

Benchmark MEDIUM

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng +4 more

Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute...

1 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Qizhi Chen, Chao Qi, Yihong Huang +5 more

Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and...

1 weeks ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi +3 more

With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software...

2 weeks ago cs.LG cs.CL cs.CR PDF

Benchmark MEDIUM

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu +10 more

Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete,...

2 weeks ago cs.AI cs.CL cs.CR PDF

Benchmark MEDIUM

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Manit Baser, Alperen Yildiz, Dinil Mon Divakaran +1 more

The static knowledge representations of large language models (LLMs) inevitably become outdated or incorrect over time. While model-editing...

2 weeks ago cs.LG PDF

Benchmark MEDIUM

Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation

Amir Al-Maamari

Large Language Models (LLMs) show promise for Automated Program Repair (APR), yet their effectiveness on security vulnerabilities remains poorly...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

Chenxi Li, Xianggan Liu, Dake Shen +9 more

Despite the rapid progress of Large Vision-Language Models (LVLMs), the integration of visual modalities introduces new safety vulnerabilities that...

2 weeks ago cs.CV cs.LG PDF

Benchmark MEDIUM

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li, Wei Zhao, Zhe Li +6 more

Backdoor mechanisms have traditionally been studied as security threats that compromise the integrity of machine learning models. However, the same...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Yuxu Ge

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

Xiaoguang Li, Hanyi Wang, Yaowei Huang +6 more

Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu +6 more

Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources....

2 weeks ago cs.CR cs.LG PDF

Benchmark MEDIUM

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Kelly L Vomo-Donfack, Adryel Hoszu, Grégory Ginot +1 more

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions...

3 weeks ago cs.LG cs.CR cs.DC PDF

Benchmark MEDIUM

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

Jiaxun Guo, Ziyuan Yang, Mengyu Sun +3 more

The rapid adoption of Large Language Models (LLMs) has transformed modern software development by enabling automated code generation at scale. While...

3 weeks ago cs.SE cs.CL PDF

Benchmark MEDIUM

In-Context Environments Induce Evaluation-Awareness in Language Models

Maheep Chaudhary

Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit...

3 weeks ago cs.AI cs.CL cs.LG PDF

Benchmark MEDIUM

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Aradhye Agarwal, Gurdit Siyan, Yash Pandya +3 more

Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon...

3 weeks ago cs.CL PDF

Benchmark MEDIUM

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

Yuhang Li, Yajie Wang, Xiangyun Tang +3 more

Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains...

3 weeks ago cs.CR PDF

Benchmark MEDIUM

CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems

Pearl Mody, Mihir Panchal, Rishit Kar +2 more

Large language model (LLM) agents are increasingly deployed in long running workflows, where they must preserve user and task state across many...

3 weeks ago cs.AI PDF

Benchmark MEDIUM

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

Junjie Chu, Xinyue Shen, Ye Leng +3 more

The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and...

3 weeks ago cs.CR cs.AI cs.SE PDF

Benchmark MEDIUM

ExpGuard: LLM Content Moderation in Specialized Domains

Minseok Choi, Dongjin Kim, Seungbin Yang +5 more

With the growing deployment of large language models (LLMs) in real-world applications, establishing robust safety guardrails to moderate their...

3 weeks ago cs.CL PDF

Benchmark MEDIUM

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Zhongxi Wang, Yueqian Lin, Jingyang Zhang +2 more

Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to...

3 weeks ago cs.LG cs.CL cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial