AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1901–1920 of 2,077 papers

Benchmark LOW

From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance

Ardalan Aryashad, Parsa Razmara, Amin Mahjoub +3 more

Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine...

5 months ago cs.CV PDF

Benchmark MEDIUM

LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis

Hangting Ye, Jinmeng Li, He Zhao +4 more

Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance...

5 months ago cs.LG PDF

Benchmark LOW

Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation

Arina Kharlamova, Bowei He, Chen Ma +1 more

Online services rely on CAPTCHAs as a first line of defense against automated abuse, yet recent advances in multi-modal large language models (MLLMs)...

5 months ago cs.AI cs.CR PDF

Attack HIGH

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

Yulin Chen, Haoran Li, Yuan Sui +2 more

With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However,...

5 months ago cs.CR PDF

Other MEDIUM

Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs

Bumjun Kim, Dongjae Jeon, Dueun Kim +2 more

Diffusion large language models (dLLMs) have emerged as a promising alternative to autoregressive models, offering flexible generation orders and...

5 months ago cs.AI PDF

Attack HIGH

From Theory to Practice: Evaluating Data Poisoning Attacks and Defenses in In-Context Learning on Social Media Health Discourse

Rabeya Amin Jhuma, Mostafa Mohaimen Akand Faisal

This study explored how in-context learning (ICL) in large language models can be disrupted by data poisoning attacks in the setting of public health...

5 months ago cs.LG cs.CL cs.CR PDF

Attack HIGH

Explainable but Vulnerable: Adversarial Attacks on XAI Explanation in Cybersecurity Applications

Maraz Mia, Mir Mehedi A. Pritom

Explainable Artificial Intelligence (XAI) has aided machine learning (ML) researchers with the power of scrutinizing the decisions of the black-box...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang, Min Bai, Nikolaos Pappas +2 more

Vision-language model (VLM)-based web agents increasingly power high-stakes selection tasks like content recommendation or product ranking by...

5 months ago cs.AI cs.CR PDF

Benchmark LOW

Can an LLM Induce a Graph? Investigating Memory Drift and Context Length

Raquib Bin Yousuf, Aadyant Khatri, Shengzhe Xu +2 more

Recently proposed evaluation benchmarks aim to characterize the effective context length and the forgetting tendencies of large language models...

5 months ago cs.CL cs.AI cs.LG PDF

Attack MEDIUM

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Fatmazohra Rezkellah, Ramzi Dakhmouche

With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We...

5 months ago cs.LG cs.CL cs.CR PDF

Benchmark MEDIUM

Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more

Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...

5 months ago cs.LG cs.AI eess.SY PDF

Attack HIGH

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol +2 more

Large Language Models (LLMs) have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn...

5 months ago cs.CR cs.AI PDF

Attack HIGH

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits

Sanket Badhe

We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses...

5 months ago cs.MA cs.AI cs.CR PDF

Benchmark MEDIUM

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...

5 months ago cs.CL PDF

Attack HIGH

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more

Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...

5 months ago cs.CR cs.AI PDF

Attack HIGH

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li +5 more

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...

5 months ago cs.CR PDF

Benchmark MEDIUM

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more

While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...

5 months ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Nikoo Naghavian, Mostafa Tavassolipour

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...

5 months ago cs.CV PDF

Attack HIGH

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie, Xurui Song, Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...

5 months ago cs.CR PDF

Attack MEDIUM

Adversarial Reinforcement Learning for Offensive and Defensive Agents in a Simulated Zero-Sum Network Environment

Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more

This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...

5 months ago cs.LG cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial