AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 361–380 of 788 papers

Clear filters

Attack HIGH

Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Binh Nguyen, Thai Le

Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADDs), moving beyond \textit{black-box}...

2 months ago cs.CL cs.SD eess.AS PDF

Attack HIGH

ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification

Xiao Lin, Philip Li, Zhichen Zeng +6 more

Despite rich safety alignment strategies, large language models (LLMs) remain highly susceptible to jailbreak attacks, which compromise safety...

2 months ago cs.LG cs.AI cs.IR PDF

Attack HIGH

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard

As Large Language Models (LLMs) are increasingly deployed in safety-critical domains, rigorously evaluating their robustness against adversarial...

2 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

Enhancing Moral Diagnosis and Correction in Large Language Models

Bocheng Chen, Xi Chen, Han Zi +5 more

Identifying specific moral errors in an input and generating appropriate corrections require moral sensitivity in large language models (LLMs), which...

2 months ago cs.CL PDF

Attack HIGH

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Xi Wang, Songlei Jian, Shasha Li +5 more

Despite extensive safety alignment, Large Language Models (LLMs) often fail against jailbreak attacks. While machine unlearning has emerged as a...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen, Yuntao Du, Kaiyuan Zhang +4 more

Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data....

2 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

Adversarial Contrastive Learning for LLM Quantization Attacks

Dinghong Song, Zhiwei Xu, Hai Wan +3 more

Model quantization is critical for deploying large language models (LLMs) on resource-constrained hardware, yet recent work has revealed severe...

2 months ago cs.CR cs.LG PDF

Attack HIGH

TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering

Scott Thornton

Large language models remain vulnerable to jailbreak attacks, and single-layer defenses often trade security for usability. We present TRYLOCK, the...

2 months ago cs.CR cs.LG PDF

Attack MEDIUM

Extracting books from production language models

Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo +1 more

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's...

2 months ago cs.CL cs.AI cs.LG PDF

Attack HIGH

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Devang Kulshreshtha, Hang Su, Chinmay Hegde +1 more

Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query...

2 months ago cs.CL PDF

Attack MEDIUM

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu +1 more

We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by...

2 months ago cs.CR cs.LG PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

2 months ago cs.CL PDF

Attack HIGH

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their...

2 months ago cs.CL PDF

Attack MEDIUM

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Jiwei Guan, Haibo Jin, Haohan Wang

Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these...

2 months ago cs.CR cs.AI cs.CV PDF

Attack MEDIUM

Aggressive Compression Enables LLM Weight Theft

Davis Brown, Juan-Pablo Rivera, Dan Hendrycks +1 more

As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

Jiajie Zhu, Xia Du, Xiaoyuan Liu +4 more

The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its...

2 months ago cs.SD cs.CR cs.MM PDF

Attack HIGH

Emoji-Based Jailbreaking of Large Language Models

M P V S Gopinadh, S Mahaboob Hussain

Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt...

2 months ago cs.CR cs.AI PDF

Attack LOW

CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns

Zhenhong Zhou, Shilinlu Yan, Chuanpu Liu +3 more

Large language models (LLMs) are increasingly deployed in cost-sensitive and on-device scenarios, and safety guardrails have advanced mainly in...

2 months ago cs.CL PDF

Attack HIGH

Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing

Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy

Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

PatchBlock: A Lightweight Defense Against Adversarial Patches for Embedded EdgeAI Devices

Nandish Chattopadhyay, Abdul Basit, Amira Guesmi +3 more

Adversarial attacks pose a significant challenge to the reliable deployment of machine learning models in EdgeAI applications, such as autonomous...

2 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial