AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 141–160 of 522 papers

Clear filters

Attack HIGH

How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks

Yanshu Wang, Shuaishuai Yang, Jingjing He +1 more

Large Language Models (LLMs) face increasing threats from jailbreak attacks that bypass safety alignment. While prompt-based defenses such as...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

I Can't Believe It's Not a Valid Exploit

Derin Gezgin, Amartya Das, Shinhae Kim +3 more

Recently Large Language Models (LLMs) have been used in security vulnerability detection tasks including generating proof-of-concept (PoC) exploits....

1 months ago cs.SE PDF

Attack HIGH

Evaluating the Vulnerability Landscape of LLM-Generated Smart Contracts

Hoang Long Do, Nasrin Sohrabi, Muneeb Ul Hassan

Large language models (LLMs) have been widely adopted in modern software development lifecycles, where they are increasingly used to automate and...

1 months ago cs.CR PDF

Attack HIGH

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

Shutong Fan, Lan Zhang, Xiaoyong Yuan

Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern...

1 months ago cs.AI PDF

Attack HIGH

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Xilong Wang, Yinuo Liu, Zhun Wang +2 more

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones....

1 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models

Chen Xiong, Zhiyuan He, Pin-Yu Chen +2 more

Activation steering is a practical post-training model alignment technique to enhance the utility of Large Language Models (LLMs). Prior to deploying...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

Mengxuan Wang, Yuxin Chen, Gang Xu +3 more

Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable...

1 months ago cs.AI cs.LG PDF

Attack HIGH

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models

Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan

Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms....

1 months ago cs.LG PDF

Attack HIGH

DF-LoGiT: Data-Free Logic-Gated Backdoor Attacks in Vision Transformers

Xiaozuo Shen, Yifei Cai, Rui Ning +2 more

The widespread adoption of Vision Transformers (ViTs) elevates supply-chain risk on third-party model hubs, where an adversary can implant backdoors...

1 months ago cs.CR PDF

Attack HIGH

Evaluating False Alarm and Missing Attacks in CAN IDS

Nirab Hossain, Pablo Moriano

Modern vehicles rely on electronic control units (ECUs) interconnected through the Controller Area Network (CAN), making in-vehicle communication a...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning

Samuel Nellessen, Tal Kachman

The evolution of large language models into autonomous agents introduces adversarial failures that exploit legitimate tool privileges, transforming...

1 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents

Pengfei He, Ash Fox, Lesly Miculicich +7 more

Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability...

1 months ago cs.LG cs.CR PDF

Attack HIGH

HPE: Hallucinated Positive Entanglement for Backdoor Attacks in Federated Self-Supervised Learning

Jiayao Wang, Yang Song, Zhendong Zhao +5 more

Federated self-supervised learning (FSSL) enables collaborative training of self-supervised representation models without sharing raw unlabeled data....

1 months ago cs.CR PDF

Attack HIGH

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Mingrui Liu, Sixiao Zhang, Cheng Long +1 more

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved...

1 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Efficient Adversarial Attacks on High-dimensional Offline Bandits

Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah

Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large...

1 months ago cs.LG cs.AI PDF

Attack HIGH

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

Haobo Wang, Weiqi Luo, Xiaojun Jia +1 more

Large vision-language models (VLMs) are vulnerable to transfer-based adversarial perturbations, enabling attackers to optimize on surrogate models...

1 months ago cs.CV PDF

Attack HIGH

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Xiaoyu Wen, Zhida He, Han Qi +7 more

Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due...

1 months ago cs.AI cs.CL cs.LG PDF

Attack HIGH

TxRay: Agentic Postmortem of Live Blockchain Attacks

Ziyue Wang, Jiangshan Yu, Kaihua Qin +3 more

Decentralized Finance (DeFi) has turned blockchains into financial infrastructure, allowing anyone to trade, lend, and build protocols without...

1 months ago cs.CR cs.AI PDF

Attack HIGH

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Terry Yue Zhuo, Yangruibo Ding, Wenbo Guo +1 more

For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at...

1 months ago cs.CR cs.AI cs.CY PDF

Attack HIGH

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Kaiyuan Cui, Yige Li, Yutao Wu +4 more

Vision-language models (VLMs) extend large language models (LLMs) with vision encoders, enabling text generation conditioned on both images and text....

1 months ago cs.LG cs.AI cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial