AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 621–640 of 715 papers

Clear filters

Attack HIGH

A Systematic Study on Generating Web Vulnerability Proof-of-Concepts Using Large Language Models

Mengyao Zhao, Kaixuan Li, Lyuye Zhang +4 more

Recent advances in Large Language Models (LLMs) have brought remarkable progress in code understanding and reasoning, creating new opportunities and...

5 months ago cs.SE PDF

Attack HIGH

Adversarial Attacks on Downstream Weather Forecasting Models: Application to Tropical Cyclone Trajectory Prediction

Yue Deng, Francisco Santos, Pang-Ning Tan +1 more

Deep learning based weather forecasting (DLWF) models leverage past weather observations to generate future forecasts, supporting a wide range of...

5 months ago cs.LG cs.CR stat.ML PDF

Attack HIGH

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt...

5 months ago cs.CL cs.CV PDF

Attack HIGH

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou +4 more

AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a...

5 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Provable Watermarking for Data Poisoning Attacks

Yifan Zhu, Lijia Yu, Xiao-Shan Gao

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying...

5 months ago cs.CR cs.LG PDF

Tool HIGH

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal +1 more

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like...

5 months ago cs.CR cs.CL PDF

Attack HIGH

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin +11 more

How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an...

5 months ago cs.LG cs.CR PDF

Attack HIGH

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai +1 more

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints...

5 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Muxi Diao, Yutao Mou, Keqing He +6 more

The safety of Large Language Models (LLMs) is crucial for the development of trustworthy AI applications. Existing red teaming methods often rely on...

5 months ago cs.CL PDF

Attack HIGH

Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

Stanisław Pawlak, Jan Dubiński, Daniel Marczak +1 more

Model merging (MM) recently emerged as an effective method for combining large deep learning models. However, it poses significant security risks....

5 months ago cs.LG cs.AI cs.CR PDF

Benchmark HIGH

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

Haoran Ou, Kangjie Chen, Xingshuo Han +4 more

Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira, Robin Staab, Thibaud Gloaguen +2 more

Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing the memory footprint of large language models...

5 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation

Weisen Jiang, Sinno Jialin Pan

This paper introduces MetaDefense, a novel framework for defending against finetuning-based jailbreak attacks in large language models (LLMs). We...

5 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

Renhua Ding, Xiao Yang, Zhengwei Fang +3 more

Large vision-language models (LVLMs) enable autonomous mobile agents to operate smartphone user interfaces, yet vulnerabilities in their perception...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

Christos Ziakas, Nicholas Loo, Nishita Jain +1 more

Automated red-teaming has emerged as a scalable approach for auditing Large Language Models (LLMs) prior to deployment, yet existing approaches lack...

5 months ago cs.CL PDF

Attack HIGH

RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning

Artur Horal, Daniel Pina, Henrique Paz +7 more

This paper presents the vision, scientific contributions, and technical details of RedTWIZ: an adaptive and diverse multi-turn red teaming framework,...

5 months ago cs.CR cs.CL PDF

Attack HIGH

Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

Jailbreaking large language models (LLMs) has emerged as a pressing concern with the increasing prevalence and accessibility of conversational LLMs....

5 months ago cs.CL PDF

Attack HIGH

Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling

Giorgio Giannone, Guangxuan Xu, Nikhil Shivakumar Nayak +4 more

Inference-Time Scaling (ITS) improves language models by allocating more computation at generation time. Particle Filtering (PF) has emerged as a...

5 months ago cs.LG cs.AI cs.CL PDF

Attack HIGH

Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks

Nouar Aldahoul, Yasir Zaki

The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has...

5 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė +3 more

Jailbreaks are adversarial attacks designed to bypass the built-in safety mechanisms of large language models. Automated jailbreaks typically...

5 months ago cs.CL cs.AI cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial