AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 541–560 of 701 papers

Clear filters

Tool HIGH

Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks

Md. Mehedi Hasan, Ziaur Rahman, Rafid Mostafiz +1 more

This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers

Dongyi Liu, Jiangtong Li, Dawei Cheng +1 more

Graph Neural Networks(GNNs) are vulnerable to backdoor attacks, where adversaries implant malicious triggers to manipulate model predictions....

5 months ago cs.CR cs.LG PDF

Attack HIGH

SecureLearn -- An Attack-agnostic Defense for Multiclass Machine Learning Against Data Poisoning Attacks

Anum Paracha, Junaid Arshad, Mohamed Ben Farah +1 more

Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance....

5 months ago cs.CR cs.LG PDF

Attack HIGH

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models

Pavlos Ntais

Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks that exploit contextual framing to bypass safety...

5 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Havva Alizadeh Noughabi, Julien Serbanescu, Fattane Zarrinkalam +1 more

Despite recent advances, Large Language Models remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs....

5 months ago cs.CL cs.AI PDF

Attack HIGH

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy

Kieu Dang, Phung Lai, NhatHai Phan +3 more

Large language models (LLMs) demonstrate remarkable capabilities across various tasks. However, their deployment introduces significant risks related...

5 months ago cs.CR PDF

Attack HIGH

Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks

Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa +8 more

Large language models remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Defending against novel...

5 months ago cs.LG PDF

Attack HIGH

Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses

Xingwei Zhong, Kar Wai Fok, Vrizlynn L. L. Thing

Multimodal large language models (MLLMs) comprise of both visual and textual modalities to process vision language tasks. However, MLLMs are...

5 months ago cs.CR PDF

Attack HIGH

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning

Mingrui Liu, Sixiao Zhang, Cheng Long +1 more

As Large Language Models (LLMs) become integral to computing infrastructure, safety alignment serves as the primary security control preventing the...

5 months ago cs.CR PDF

Attack HIGH

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Yukun Jiang, Mingjie Li, Michael Backes +1 more

Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful...

5 months ago cs.CR PDF

Attack HIGH

Can Current Detectors Catch Face-to-Voice Deepfake Attacks?

Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore +1 more

The rapid advancement of generative models has enabled the creation of increasingly stealthy synthetic voices, commonly referred to as audio...

5 months ago cs.CR cs.LG cs.MM PDF

Attack HIGH

Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training

Zheng-Xin Yong, Stephen H. Bach

We discover a novel and surprising phenomenon of unintentional misalignment in reasoning language models (RLMs), which we call self-jailbreaking....

5 months ago cs.CR cs.CL PDF

Attack HIGH

AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN

Wei Shao, Yuhao Wang, Rongguang He +2 more

Existing defence mechanisms have demonstrated significant effectiveness in mitigating rule-based Denial-of-Service (DoS) attacks, leveraging...

5 months ago cs.CR cs.AI PDF

Attack HIGH

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

Chiyu Chen, Xinhao Song, Yunkai Chai +7 more

Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs). Operating in...

5 months ago cs.CR cs.AI PDF

Survey HIGH

Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses

Wu Yichao, Wang Yirui, Ding Panpan +3 more

With the wide application of deep reinforcement learning (DRL) techniques in complex fields such as autonomous driving, intelligent manufacturing,...

5 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur +3 more

Multimodal large language models (MLLMs) have achieved remarkable progress, yet remain critically vulnerable to adversarial attacks that exploit...

5 months ago cs.CR cs.MM PDF

Survey HIGH

Ask What Your Country Can Do For You: Towards a Public Red Teaming Model

Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave +10 more

AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to...

5 months ago cs.CY cs.AI cs.CR PDF

Benchmark HIGH

The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models

Euodia Dodd, Nataša Krčo, Igor Shilov +1 more

Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art...

5 months ago cs.LG cs.CR PDF

Attack HIGH

Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems

Mohamed ElShehaby, Ashraf Matrawy

Adversarial attacks pose significant challenges to Machine Learning (ML) systems and especially Deep Neural Networks (DNNs) by subtly manipulating...

5 months ago cs.CR cs.LG PDF

Attack HIGH

Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection

Ariana Yi, Ce Zhou, Liyang Xiao +1 more

As object detection models are increasingly deployed in cyber-physical systems such as autonomous vehicles (AVs) and surveillance platforms, ensuring...

5 months ago cs.CV cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial