AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 421–440 of 598 papers

Clear filters

Benchmark MEDIUM

PATCHEVAL: A New Benchmark for Evaluating LLMs on Patching Real-World Vulnerabilities

Zichao Wei, Jun Zeng, Ming Wen +8 more

Software vulnerabilities are increasing at an alarming rate. However, manual patching is both time-consuming and resource-intensive, while existing...

4 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Robustness of LLM-enabled vehicle trajectory prediction under data security threats

Feilong Wang, Fuqiang Liu

The integration of large language models (LLMs) into automated driving systems has opened new possibilities for reasoning and decision-making by...

4 months ago cs.LG cs.AI cs.CR PDF

Benchmark MEDIUM

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

Guangke Chen, Yuhui Wang, Shouling Ji +2 more

Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully...

4 months ago cs.SD cs.AI cs.CR PDF

Benchmark MEDIUM

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

Fred Heiding, Simon Lermen

We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to...

4 months ago cs.CR cs.AI cs.CY PDF

Benchmark LOW

OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models

Yuping Yan, Yuhan Xie, Yuanshuai Li +3 more

Since Multimodal Large Language Models (MLLMs) are increasingly being integrated into everyday tools and intelligent agents, growing concerns have...

4 months ago cs.LG cs.CL PDF

Benchmark LOW

CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D

Francis Rhys Ward, Teun van der Weij, Hanna Gábor +6 more

AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning...

4 months ago cs.AI PDF

Benchmark MEDIUM

Taught by the Flawed: How Dataset Insecurity Breeds Vulnerable AI Code

Catherine Xia, Manar H. Alalfi

AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately...

4 months ago cs.CR cs.AI PDF

Benchmark LOW

CARScenes: Semantic VLM Dataset for Safe Autonomous Driving

Yuankai He, Weisong Shi

CAR-Scenes is a frame-level dataset for autonomous driving that enables training and evaluation of vision-language models (VLMs) for interpretable,...

4 months ago cs.CV cs.RO PDF

Benchmark LOW

Toward Honest Language Models for Deductive Reasoning

Jiarui Liu, Kaustubh Dhole, Yingheng Wang +7 more

Deductive reasoning is the process of deriving conclusions strictly from the given premises, without relying on external knowledge. We define honesty...

4 months ago cs.CL PDF

Benchmark MEDIUM

One Signature, Multiple Payments: Demystifying and Detecting Signature Replay Vulnerabilities in Smart Contracts

Zexu Wang, Jiachi Chen, Zewei Lin +7 more

Smart contracts have significantly advanced blockchain technology, and digital signatures are crucial for reliable verification of contract...

4 months ago cs.CR cs.SE PDF

Benchmark LOW

Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

Shengbo Wang, Hong Sun, Ke Li

Interactive preference elicitation (IPE) aims to substantially reduce human effort while acquiring human preferences in wide personalization systems....

4 months ago cs.LG PDF

Benchmark MEDIUM

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Yunfei Yang, Xiaojun Chen, Yuexin Xuan +3 more

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific...

4 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Kazuki Iwahana, Yusuke Yamasaki, Akira Ito +2 more

Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into...

4 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

From LLMs to Agents: A Comparative Evaluation of LLMs and LLM-based Agents in Security Patch Detection

Junxiao Han, Zheng Yu, Lingfeng Bao +5 more

The widespread adoption of open-source software (OSS) has accelerated software innovation but also increased security risks due to the rapid...

4 months ago cs.CR cs.SE PDF

Benchmark HIGH

MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement

Zhishen Sun, Guang Dai, Haishan Ye

LLMs demonstrate performance comparable to human abilities in complex tasks such as mathematical reasoning, but their robustness in mathematical...

4 months ago cs.AI PDF

Benchmark LOW

Probabilities Are All You Need: A Probability-Only Approach to Uncertainty Estimation in Large Language Models

Manh Nguyen, Sunil Gupta, Hung Le

Large Language Models (LLMs) exhibit strong performance across various natural language processing (NLP) tasks but remain vulnerable to...

4 months ago cs.LG PDF

Benchmark MEDIUM

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

Binyan Xu, Fan Yang, Di Tang +2 more

Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to...

4 months ago cs.CV cs.CR cs.LG PDF

Benchmark MEDIUM

On Stealing Graph Neural Network Models

Marcin Podhajski, Jan Dubiński, Franziska Boenisch +3 more

Current graph neural network (GNN) model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits. However, in...

4 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang, Mingzi Zhang, Xuanyu Yin +5 more

Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their...

4 months ago cs.CL PDF

Benchmark MEDIUM

Sensitivity of Small Language Models to Fine-tuning Data Contamination

Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data...

4 months ago cs.CL cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial