AI Security Research

2,104+ academic papers on AI security, attacks, and defenses

Total

2,104

Attack

820

Benchmark

609

Defense

276

Tool

229

Survey

116

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 601–620 of 2,104 papers

Benchmark MEDIUM

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Casey Ford, Madison Van Doren, Emily Dix

Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...

1 months ago cs.CL cs.AI cs.HC PDF

Benchmark LOW

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Mengru Wang, Zhenqian Xu, Junfeng Fang +4 more

Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content....

1 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers

Yike Sun, Haotong Yang, Zhouchen Lin +1 more

Tokenization is fundamental to how language models represent and process text, yet the behavior of widely used BPE tokenizers has received far less...

1 months ago cs.CL PDF

Attack MEDIUM

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

Ariel Fogel, Omer Hofman, Eilon Cohen +1 more

Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is...

1 months ago cs.CR cs.LG PDF

Attack MEDIUM

A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness

Leo Schwinn, Moritz Ladenburger, Tim Beyer +3 more

Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For...

1 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more

Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...

1 months ago cs.CL cs.AI cs.DC PDF

Defense MEDIUM

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more

Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...

1 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Attack Selection Reduces Safety in Concentrated AI Control Settings against Trusted Monitoring

Joachim Schaeffer, Arjun Khandelwal, Tyler Tracy

Future AI deployments will likely be monitored for malicious behaviour. The ability of these AIs to subvert monitors by adversarially selecting...

1 months ago cs.CR cs.AI PDF

Attack HIGH

When and Where to Attack? Stage-wise Attention-Guided Adversarial Attack on Large Vision Language Models

Jaehyun Kwak, Nam Cao, Boryeong Cho +3 more

Adversarial attacks against Large Vision-Language Models (LVLMs) are crucial for exposing safety vulnerabilities in modern multimodal systems. Recent...

1 months ago cs.CV PDF

Tool MEDIUM

PriMod4AI: Lifecycle-Aware Privacy Threat Modeling for AI Systems using LLM

Gautam Savaliya, Robert Aufschläger, Abhishek Subedi +2 more

Artificial intelligence systems introduce complex privacy risks throughout their lifecycle, especially when processing sensitive or high-dimensional...

1 months ago cs.CR cs.AI PDF

Attack HIGH

How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks

Yanshu Wang, Shuaishuai Yang, Jingjing He +1 more

Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as...

1 months ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

Youngji Roh, Hyunjin Cho, Jaehyung Kim

Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a...

1 months ago cs.CL PDF

Attack MEDIUM

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Zeming Wei, Qiaosheng Zhang, Xia Hu +1 more

Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to...

1 months ago cs.LG cs.AI cs.CL PDF

Defense MEDIUM

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen +1 more

Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where...

1 months ago cs.SE cs.CR PDF

Attack HIGH

I Can't Believe It's Not a Valid Exploit

Derin Gezgin, Amartya Das, Shinhae Kim +3 more

Recently Large Language Models (LLMs) have been used in security vulnerability detection tasks including generating proof-of-concept (PoC) exploits....

1 months ago cs.SE PDF

Attack HIGH

Evaluating the Vulnerability Landscape of LLM-Generated Smart Contracts

Hoang Long Do, Nasrin Sohrabi, Muneeb Ul Hassan

Large language models (LLMs) have been widely adopted in modern software development lifecycles, where they are increasingly used to automate and...

1 months ago cs.CR PDF

Attack HIGH

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

Shutong Fan, Lan Zhang, Xiaoyong Yuan

Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern...

1 months ago cs.AI PDF

Benchmark LOW

Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

Bibhabasu Mandal, Sagnik Nandy

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount...

1 months ago stat.ML cs.CR cs.LG PDF

Attack HIGH

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Xilong Wang, Yinuo Liu, Zhun Wang +2 more

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones....

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning

Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade +1 more

We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an...

1 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial