AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 681–700 of 986 papers

Clear filters

Attack MEDIUM

LLM Reinforcement in Context

Thomas Rivasseau

Current Large Language Model alignment research mostly focuses on improving model robustness against adversarial attacks and misbehavior by training...

4 months ago cs.CL cs.CR PDF

Tool MEDIUM

Scalable Hierarchical AI-Blockchain Framework for Real-Time Anomaly Detection in Large-Scale Autonomous Vehicle Networks

Rathin Chandra Shit, Sharmila Subudhi

The security of autonomous vehicle networks is facing major challenges, owing to the complexity of sensor integration, real-time performance demands,...

4 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

SGuard-v1: Safety Guardrail for Large Language Models

JoonHo Lee, HyeonMin Cho, Jaewoong Yun +3 more

We present SGuard-v1, a lightweight safety guardrail for Large Language Models (LLMs), which comprises two specialized models to detect harmful...

4 months ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

ToxSearch: Evolving Prompts for Toxicity Search in Large Language Models

Onkar Shelar, Travis Desell

Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a...

4 months ago cs.NE cs.AI cs.CL PDF

Attack MEDIUM

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

Yuting Tan, Yi Huang, Zhuo Li

Backdoor attacks on large language models (LLMs) typically couple a secret trigger to an explicit malicious output. We show that this explicit...

4 months ago cs.LG cs.CR PDF

Defense MEDIUM

Rethinking Deep Alignment Through The Lens Of Incomplete Learning

Thong Bach, Dung Nguyen, Thao Minh Le +1 more

Large language models exhibit systematic vulnerabilities to adversarial attacks despite extensive safety alignment. We provide a mechanistic analysis...

4 months ago cs.LG PDF

Attack MEDIUM

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

Sajad U P

Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant...

4 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

Shaowei Guan, Yu Zhai, Zhengyu Zhang +2 more

Large Language Models (LLMs) are increasingly vulnerable to adversarial attacks that can subtly manipulate their outputs. While various defense...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

Shanmin Wang, Dongdong Zhao

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party...

4 months ago cs.CR cs.AI cs.CV PDF

Attack MEDIUM

On the Trade-Off Between Transparency and Security in Adversarial Machine Learning

Lucas Fenaux, Christopher Srinivasa, Florian Kerschbaum

Transparency and security are both central to Responsible AI, but they may conflict in adversarial settings. We investigate the strategic effect of...

4 months ago cs.LG cs.CR cs.GT PDF

Benchmark MEDIUM

SEAL: Subspace-Anchored Watermarks for LLM Ownership

Yanbo Dai, Zongjie Li, Zhenlan Ji +1 more

Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level...

4 months ago cs.CR PDF

Defense MEDIUM

EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment

Ruoxi Cheng, Haoxuan Ma, Teng Ma +1 more

Large Vision-Language Models (LVLMs) exhibit powerful reasoning capabilities but suffer sophisticated jailbreak vulnerabilities. Fundamentally,...

4 months ago cs.AI PDF

Attack MEDIUM

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

Farhad Abtahi, Fernando Seoane, Iván Pau +1 more

Healthcare AI systems face major vulnerabilities to data poisoning that current defenses and regulations cannot adequately address. We analyzed eight...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

PATCHEVAL: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities

Zichao Wei, Jun Zeng, Ming Wen +8 more

Software vulnerabilities are increasing at an alarming rate. However, manual patching is both time-consuming and resource-intensive, while existing...

4 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Robustness of LLM-enabled vehicle trajectory prediction under data security threats

Feilong Wang, Fuqiang Liu

The integration of large language models (LLMs) into automated driving systems has opened new possibilities for reasoning and decision-making by...

4 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

Guangke Chen, Yuhui Wang, Shouling Ji +2 more

Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully...

4 months ago cs.SD cs.AI cs.CR PDF

Tool MEDIUM

ICX360: In-Context eXplainability 360 Toolkit

Dennis Wei, Ronny Luss, Xiaomeng Hu +6 more

Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting...

4 months ago cs.CL cs.LG PDF

Benchmark MEDIUM

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

Fred Heiding, Simon Lermen

We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to...

4 months ago cs.CR cs.AI cs.CY PDF

Defense MEDIUM

EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

Jialin Wu, Kecen Li, Zhicong Huang +3 more

Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code...

4 months ago cs.CL cs.CR PDF

Benchmark MEDIUM

Taught by the Flawed: How Dataset Insecurity Breeds Vulnerable AI Code

Catherine Xia, Manar H. Alalfi

AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately...

4 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial