AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 221–240 of 259 papers

Clear filters

Attack MEDIUM

Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity

Zaixi Zhang, Souradip Chakraborty, Amrit Singh Bedi +16 more

The rapid adoption of generative artificial intelligence (GenAI) in the biosciences is transforming biotechnology, medicine, and synthetic biology....

5 months ago cs.CR q-bio.BM PDF

Attack MEDIUM

Safeguarding Efficacy in Large Language Models: Evaluating Resistance to Human-Written and Algorithmic Adversarial Prompts

Tiarnaigh Downey-Webb, Olamide Jogunola, Oluwaseun Ajao

This paper presents a systematic security assessment of four prominent Large Language Models (LLMs) against diverse adversarial attack vectors. We...

5 months ago cs.CR cs.AI cs.CY PDF

Attack MEDIUM

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Brandon Lit, Edward Crowder, Daniel Vogel +1 more

AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in...

5 months ago cs.HC PDF

Attack MEDIUM

The Model's Language Matters: A Comparative Privacy Analysis of LLMs

Abhishek K. Mishra, Antoine Boutet, Lucas Magnana

Large Language Models (LLMs) are increasingly deployed across multilingual applications that handle sensitive data, yet their scale and linguistic...

5 months ago cs.CL cs.CR PDF

Attack MEDIUM

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu, Lulu Tang

Vision-Language Models (VLMs) have garnered significant attention for their remarkable ability to interpret and generate multimodal content. However,...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness

Jiyang Qiu, Xinbei Ma, Yunqing Xu +2 more

The rapid deployment of large language model (LLM)-based agents in real-world applications has raised serious concerns about their trustworthiness....

5 months ago cs.AI PDF

Attack MEDIUM

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Tavish McDonald, Bo Lei, Stanislav Fort +2 more

Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba...

5 months ago cs.LG PDF

Attack MEDIUM

Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization

Tiancheng Xing, Jerry Li, Yixuan Du +1 more

Large language models (LLMs) are increasingly used as rerankers in information retrieval, yet their ranking behavior can be steered by small,...

5 months ago cs.CL cs.AI cs.IR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Zizhao Wang, Dingcheng Li, Vaishakh Keshava +4 more

Large Language Model (LLM) agents can leverage tools such as Google Search to complete complex tasks. However, this tool usage introduces the risk of...

5 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

Guangyu Shen, Siyuan Cheng, Xiangzhe Xu +4 more

Large Language Models (LLMs) can acquire deceptive behaviors through backdoor attacks, where the model executes prohibited actions whenever secret...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang, Min Bai, Nikolaos Pappas +2 more

Vision-language model (VLM)-based web agents increasingly power high-stakes selection tasks like content recommendation or product ranking by...

5 months ago cs.AI cs.CR PDF

Attack MEDIUM

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Fatmazohra Rezkellah, Ramzi Dakhmouche

With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...

5 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

5 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Inverse Language Modeling towards Robust and Grounded LLMs

Davide Gabrielli, Simone Sestito, Iacopo Masi

The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...

5 months ago cs.CL PDF

Attack MEDIUM

Securing generative artificial intelligence with parallel magnetic tunnel junction true randomness

Youwei Bao, Shuhan Yang, Hyunsoo Yang

Deterministic pseudo random number generators (PRNGs) used in generative artificial intelligence (GAI) models produce predictable patterns vulnerable...

5 months ago cs.LG cond-mat.mtrl-sci physics.data-an PDF

Attack MEDIUM

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

Zhenyu Pan, Yiting Zhang, Zhuo Liu +13 more

LLM-based multi-agent systems excel at planning, tool use, and role coordination, but their openness and interaction complexity also expose them to...

5 months ago cs.AI PDF

Attack MEDIUM

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Jaiden Fairoze, Sanjam Garg, Keewoo Lee +1 more

As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms...

5 months ago cs.LG cs.CR PDF

Attack MEDIUM

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned...

5 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda +1 more

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive...

5 months ago cs.LG cs.CR PDF

Attack MEDIUM

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao +4 more

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware...

5 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial