AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1421–1440 of 2,077 papers

Attack MEDIUM

ToxSearch: Evolving Prompts for Toxicity Search in Large Language Models

Onkar Shelar, Travis Desell

Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a...

4 months ago cs.NE cs.AI cs.CL PDF

Attack HIGH

GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Jiaji Ma, Puja Trivedi, Danai Koutra

Text-attributed graphs (TAGs), which combine structural and textual node information, are ubiquitous across many domains. Recent work integrates...

4 months ago cs.CR cs.LG PDF

Attack MEDIUM

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

Yuting Tan, Yi Huang, Zhuo Li

Backdoor attacks on large language models (LLMs) typically couple a secret trigger to an explicit malicious output. We show that this explicit...

4 months ago cs.LG cs.CR PDF

Benchmark LOW

GenSIaC: Toward Security-Aware Infrastructure-as-Code Generation with Large Language Models

Yikun Li, Matteo Grella, Daniel Nahmias +5 more

In recent years, Infrastructure as Code (IaC) has emerged as a critical approach for managing and provisioning IT infrastructure through code and...

4 months ago cs.CR cs.SE PDF

Attack HIGH

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

Hasini Jayathilaka

Prompt injection attacks are an emerging threat to large language models (LLMs), enabling malicious users to manipulate outputs through carefully...

4 months ago cs.CR PDF

Attack HIGH

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Rui Wang, Zeming Wei, Xiyue Zhang +1 more

Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these...

4 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

AlignTree: Efficient Defense Against LLM Jailbreak Attacks

Gil Goren, Shahar Katz, Lior Wolf

Large Language Models (LLMs) are vulnerable to adversarial attacks that bypass safety guidelines and generate harmful content. Mitigating these...

4 months ago cs.LG PDF

Defense HIGH

Multi-Agent Collaborative Fuzzing with Continuous Reflection for Smart Contracts Vulnerability Detection

Jie Chen, Liangmin Wang

Fuzzing is a widely used technique for detecting vulnerabilities in smart contracts, which generates transaction sequences to explore the execution...

4 months ago cs.CR cs.SE PDF

Defense MEDIUM

Rethinking Deep Alignment Through The Lens Of Incomplete Learning

Thong Bach, Dung Nguyen, Thao Minh Le +1 more

Large language models exhibit systematic vulnerabilities to adversarial attacks despite extensive safety alignment. We provide a mechanistic analysis...

4 months ago cs.LG PDF

Benchmark HIGH

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

Jiayu Li, Yunhan Zhao, Xiang Zheng +4 more

Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of...

4 months ago cs.CR cs.AI cs.CV PDF

Attack MEDIUM

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

Sajad U P

Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant...

4 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

Shaowei Guan, Yu Zhai, Zhengyu Zhang +2 more

Large Language Models (LLMs) are increasingly vulnerable to adversarial attacks that can subtly manipulate their outputs. While various defense...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

Shanmin Wang, Dongdong Zhao

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party...

4 months ago cs.CR cs.AI cs.CV PDF

Attack HIGH

BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel

Hao Li, Jiajun He, Guangshuo Wang +3 more

Retrieval-Augmented Generation (RAG) enhances large language models by integrating external knowledge, but reliance on proprietary or sensitive...

4 months ago cs.CR PDF

Attack MEDIUM

On the Trade-Off Between Transparency and Security in Adversarial Machine Learning

Lucas Fenaux, Christopher Srinivasa, Florian Kerschbaum

Transparency and security are both central to Responsible AI, but they may conflict in adversarial settings. We investigate the strategic effect of...

4 months ago cs.LG cs.CR cs.GT PDF

Benchmark LOW

SCRUTINEER: Detecting Logic-Level Usage Violations of Reusable Components in Smart Contracts

Xingshuang Lin, Binbin Zhao, Jinwen Wang +3 more

Smart Contract Reusable Components(SCRs) play a vital role in accelerating the development of business-specific contracts by promoting modularity and...

4 months ago cs.SE cs.CR PDF

Survey HIGH

SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Open Challenges

Gioliano de Oliveira Braga, Pedro Henrique dos Santos Rocha, Rafael Pimenta de Mattos Paixão +3 more

Wi-Fi Channel State Information (CSI) has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational...

4 months ago cs.CR cs.LG cs.NI PDF

Benchmark MEDIUM

SEAL: Subspace-Anchored Watermarks for LLM Ownership

Yanbo Dai, Zongjie Li, Zhenlan Ji +1 more

Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level...

4 months ago cs.CR PDF

Attack HIGH

NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Lama Sleem, Jerome Francois, Lujun Li +3 more

Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite...

4 months ago cs.CR cs.AI PDF

Survey LOW

Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions

Shaowei Guan, Hin Chi Kwok, Ngai Fong Law +3 more

Retrieval-augmented generation (RAG) has rapidly emerged as a transformative approach for integrating large language models into clinical and...

4 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial