AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1821–1840 of 2,045 papers

Clear filters

Benchmark MEDIUM

Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld +1 more

The performance of Large Language Models (LLMs) is determined by their training data. Despite the proliferation of open-weight LLMs, access to LLM...

5 months ago cs.CL PDF

Attack HIGH

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou +4 more

AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a...

5 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

Yongding Tao, Tian Wang, Yihong Dong +4 more

Data contamination poses a significant threat to the reliable evaluation of Large Language Models (LLMs). This issue arises when benchmark samples...

5 months ago cs.CL cs.AI cs.LG PDF

Defense MEDIUM

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

MingSheng Li, Guangze Zhao, Sichen Liu

Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a...

5 months ago cs.AI cs.CR PDF

Attack HIGH

Provable Watermarking for Data Poisoning Attacks

Yifan Zhu, Lijia Yu, Xiao-Shan Gao

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying...

5 months ago cs.CR cs.LG PDF

Tool HIGH

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal +1 more

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like...

5 months ago cs.CR cs.CL PDF

Other MEDIUM

Repairing Regex Vulnerabilities via Localization-Guided Instructions

Sicheol Sung, Joonghyuk Hahn, Yo-Sub Han

Regular expressions (regexes) are foundational to modern computing for critical tasks like input validation and data parsing, yet their ubiquity...

5 months ago cs.AI cs.PL PDF

Attack HIGH

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin +11 more

How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an...

5 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Xiaonan Si, Meilin Zhu, Simeng Qin +7 more

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and...

5 months ago cs.CL cs.AI PDF

Attack MEDIUM

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Brandon Lit, Edward Crowder, Daniel Vogel +1 more

AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in...

5 months ago cs.HC PDF

Attack HIGH

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai +1 more

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints...

5 months ago cs.CL cs.AI cs.CR PDF

Survey LOW

Vibe Coding: Toward an AI-Native Paradigm for Semantic and Intent-Driven Programming

Vinay Bamil

Recent advances in large language models have enabled developers to generate software by conversing with artificial intelligence systems rather than...

5 months ago cs.SE cs.HC PDF

Benchmark MEDIUM

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

Debeshee Das, Luca Beurer-Kellner, Marc Fischer +1 more

The increasing adoption of LLM agents with access to numerous tools and sensitive data significantly widens the attack surface for indirect prompt...

5 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

The Model's Language Matters: A Comparative Privacy Analysis of LLMs

Abhishek K. Mishra, Antoine Boutet, Lucas Magnana

Large Language Models (LLMs) are increasingly deployed across multilingual applications that handle sensitive data, yet their scale and linguistic...

5 months ago cs.CL cs.CR PDF

Attack MEDIUM

VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands

Aofan Liu, Lulu Tang

Vision-Language Models (VLMs) have garnered significant attention for their remarkable ability to interpret and generate multimodal content. However,...

5 months ago cs.CR cs.AI PDF

Attack HIGH

AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming

Muxi Diao, Yutao Mou, Keqing He +6 more

The safety of Large Language Models (LLMs) is crucial for the development of trustworthy AI applications. Existing red teaming methods often rely on...

5 months ago cs.CL PDF

Attack MEDIUM

Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness

Jiyang Qiu, Xinbei Ma, Yunqing Xu +2 more

The rapid deployment of large language model (LLM)-based agents in real-world applications has raised serious concerns about their trustworthiness....

5 months ago cs.AI PDF

Attack HIGH

Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

Stanisław Pawlak, Jan Dubiński, Daniel Marczak +1 more

Model merging (MM) recently emerged as an effective method for combining large deep learning models. However, it poses significant security risks....

5 months ago cs.LG cs.AI cs.CR PDF

Benchmark HIGH

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

Haoran Ou, Kangjie Chen, Xingshuo Han +4 more

Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Kazuki Egashira, Robin Staab, Thibaud Gloaguen +2 more

Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing the memory footprint of large language models...

5 months ago cs.LG cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial