AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 603 papers

Clear filters

Benchmark MEDIUM

LoMime: Query-Efficient Membership Inference using Model Extraction in Label-Only Settings

Abdullah Caglar Oksuz, Anisa Halimi, Erman Ayday

Membership inference attacks (MIAs) threaten the privacy of machine learning models by revealing whether a specific data point was used during...

1 months ago cs.LG cs.CR PDF

Benchmark LOW

Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse

Martin Bertran, Riccardo Fogliato, Zhiwei Steven Wu

Empirical conclusions depend not only on data but on analytic decisions made throughout the research process. Many-analyst studies have quantified...

1 months ago cs.AI cs.LG PDF

Benchmark HIGH

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...

1 months ago cs.CL cs.AI cs.DB PDF

Benchmark LOW

Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm

Anna Babarczy, Andras Lukacs, Peter Vedres +1 more

The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities -- specifically, the ability to infer...

1 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Zachary Coalson, Bo Fang, Sanghyun Hong

Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in...

1 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

What Makes a Good LLM Agent for Real-world Penetration Testing?

Gelei Deng, Yi Liu, Yuekang Li +5 more

LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28...

1 months ago cs.CR cs.SE PDF

Benchmark LOW

ReIn: Conversational Error Recovery with Reasoning Inception

Takyoung Kim, Jinseok Nam, Chandrayee Basu +5 more

Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue...

1 months ago cs.CL cs.AI PDF

Benchmark HIGH

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri

Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities...

1 months ago cs.AI cs.CL PDF

Benchmark MEDIUM

Large-scale online deanonymization with LLMs

Simon Lermen, Daniel Paleka, Joshua Swanson +3 more

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News...

1 months ago cs.CR cs.AI cs.LG PDF

Benchmark LOW

Towards a Science of AI Agent Reliability

Stephan Rabanser, Sayash Kapoor, Peter Kirgis +3 more

AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many...

1 months ago cs.AI cs.CY cs.LG PDF

Benchmark MEDIUM

Privacy-Aware Split Inference with Speculative Decoding for Large Language Models over Wide-Area Networks

Michael Cunningham

We present a practical system for privacy-aware large language model (LLM) inference that splits a transformer between a trusted local GPU and an...

1 months ago cs.CR cs.DC PDF

Benchmark MEDIUM

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal +2 more

LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to...

1 months ago cs.CL cs.LG PDF

Benchmark MEDIUM

NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist

Johannes Bertram, Jonas Geiping

We introduce NESSiE, the NEceSsary SafEty benchmark for large language models (LLMs). With minimal test cases of information and access security,...

1 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Intent Laundering: AI Safety Datasets Are Not What They Seem

Shahriar Golchin, Marc Wetter

We systematically evaluate the quality of widely used AI safety datasets from two perspectives: in isolation and in practice. In isolation, we...

1 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Haodong Zhao, Jinming Hu, Gongshen Liu

Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt...

1 months ago cs.CR PDF

Benchmark LOW

Semantic-Guided 3D Gaussian Splatting for Transient Object Removal

Aditi Prabakaran, Priyesh Shukla

Transient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on...

1 months ago cs.CV PDF

Benchmark LOW

A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection

Udbhav Prasad, Aniesh Chawla

Cryptographic digests (e.g., MD5, SHA-256) are designed to provide exact identity. Any single-bit change in the input produces a completely different...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin

Detecting prompt injection and jailbreak attacks is critical for deploying LLM-based agents safely. As agents increasingly process untrusted data...

1 months ago cs.LG PDF

Benchmark LOW

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

Edibe Yilmaz, Kahraman Kostas

The integration of large language models (LLMs) into educational processes introduces significant constraints regarding data privacy and reliability,...

1 months ago cs.CL cs.AI cs.CR PDF

Benchmark HIGH

Execution-State-Aware LLM Reasoning for Automated Proof-of-Vulnerability Generation

Haoyu Li, Xijia Che, Yanhao Wang +2 more

Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false...

1 months ago cs.SE cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial