AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1561–1580 of 2,027 papers

Clear filters

Defense MEDIUM

Reimagining Safety Alignment with An Image

Yifan Xia, Guorui Chen, Wenqian Yu +3 more

Large language models (LLMs) excel in diverse applications but face dual challenges: generating harmful content under jailbreak attacks and...

4 months ago cs.AI cs.CR PDF

Defense MEDIUM

Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models

Mohammed N. Swileh, Shengli Zhang

Centralized Software-Defined Networking (cSDN) offers flexible and programmable control of networks but suffers from scalability and reliability...

4 months ago cs.CR cs.AI PDF

Attack HIGH

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

Ruofan Liu, Yun Lin, Zhiyong Huang +1 more

Large language models (LLMs) are increasingly integrated into IT infrastructures, where they process user data according to predefined instructions....

4 months ago cs.CR cs.AI PDF

Attack HIGH

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao, Haiyang Zhao, Yimin Chen +3 more

The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from...

4 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks

Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz +2 more

The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this...

4 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Diffusion LLMs are Natural Adversaries for any LLM

David Lüdke, Tom Wollschläger, Paul Ungermann +2 more

We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized...

4 months ago cs.LG stat.ML PDF

Other LOW

Simulating Misinformation Vulnerabilities With Agent Personas

David Farr, Lynnette Hui Xian Ng, Stephen Prochaska +2 more

Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information...

4 months ago cs.SI cs.AI cs.CL PDF

Defense HIGH

On the Difficulty of Selecting Few-Shot Examples for Effective LLM-based Vulnerability Detection

Md Abdul Hannan, Ronghao Ni, Chi Zhang +3 more

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of coding tasks, including summarization, translation,...

4 months ago cs.SE cs.CR cs.LG PDF

Survey MEDIUM

Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Kathrin Grosse, Nico Ebert

Recent improvement gains in large language models (LLMs) have lead to everyday usage of AI-based Conversational Agents (CAs). At the same time, LLMs...

4 months ago cs.CR PDF

Attack MEDIUM

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

Chenghao Du, Quanfeng Huang, Tingxuan Tang +3 more

Large Language Models (LLMs) have transformed software development, enabling AI-powered applications known as LLM-based agents that promise to...

4 months ago cs.CR PDF

Benchmark MEDIUM

Self-HarmLLM: Can Large Language Model Harm Itself?

Heehwan Kim, Sungjune Park, Daeseon Choi

Large Language Models (LLMs) are generally equipped with guardrails to block the generation of harmful responses. However, existing defenses always...

4 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

Arnabh Borah, Md Tanvirul Alam, Nidhi Rastogi

Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan, Alexander Matt Turner, Mark Kurzeja +2 more

An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy...

4 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park +2 more

As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from...

4 months ago cs.CL cs.AI PDF

Tool HIGH

LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks

Seif Ikbarieh, Maanak Gupta, Elmahedi Mahalal

The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and...

4 months ago cs.CR cs.AI PDF

Defense LOW

The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy

William Overman, Mohsen Bayati

As increasingly capable agents are deployed, a central safety challenge is how to retain meaningful human control without modifying the underlying...

4 months ago cs.AI cs.LG PDF

Benchmark LOW

Using Copilot Agent Mode to Automate Library Migration: A Quantitative Assessment

Aylton Almeida, Laerte Xavier, Marco Tulio Valente

Keeping software systems up to date is essential to avoid technical debt, security vulnerabilities, and the rigidity typical of legacy systems....

4 months ago cs.SE PDF

Benchmark MEDIUM

Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token

Shaked Zychlinski, Yuval Kainan

Large Language Models (LLMs) are susceptible to jailbreak attacks where malicious prompts are disguised using ciphers and character-level encodings...

4 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification

Yingjia Wang, Ting Qiao, Xing Liu +3 more

The rapid advancement of deep neural networks (DNNs) heavily relies on large-scale, high-quality datasets. However, unauthorized commercial use of...

4 months ago cs.CR cs.AI PDF

Attack HIGH

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards...

4 months ago cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial