AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1981–2000 of 2,077 papers

Survey MEDIUM

LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models

Guolei Huang, Qinzhi Peng, Gan Xu +3 more

As Vision-Language Models (VLMs) move into interactive, multi-turn use, safety concerns intensify for multimodal multi-turn dialogue, which is...

5 months ago cs.CV PDF

Benchmark HIGH

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

Simin Chen, Yixin He, Suman Jana +1 more

LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch...

5 months ago cs.SE PDF

Benchmark LOW

SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents

Ruolin Chen, Yinqian Sun, Jihang Wang +3 more

Embodied agents powered by large language models (LLMs) inherit advanced planning capabilities; however, their direct interaction with the physical...

5 months ago cs.AI PDF

Attack HIGH

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Yein Park, Jungwoo Park, Jaewoo Kang

Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes....

5 months ago cs.AI PDF

Benchmark LOW

Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space

Xiang Zhang, Kun Wei, Xu Yang +3 more

As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is...

5 months ago cs.LG cs.CL PDF

Other LOW

How Diffusion Models Memorize

Juyeop Kim, Songkuk Kim, Jong-Seok Lee

Despite their success in image generation, diffusion models can memorize training data, raising serious privacy and copyright concerns. Although...

5 months ago cs.CV PDF

Tool HIGH

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Jing-Jing Li, Jianfeng He, Chao Shang +6 more

As LLMs advance into autonomous agents with tool-use capabilities, they introduce security challenges that extend beyond traditional content-based...

5 months ago cs.CR cs.AI cs.CL PDF

Defense LOW

Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen +4 more

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing and reasoning over diverse modalities, but their...

5 months ago cs.CR cs.LG PDF

Benchmark LOW

EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition

Jiacheng Shi, Hongfei Du, Y. Alicia Hong +1 more

Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance...

5 months ago cs.SD cs.AI PDF

Other LOW

A Method for Quantifying Human Risk and a Blueprint for LLM Integration

Giuseppe Canale

This paper presents the Cybersecurity Psychology Framework (CPF), a novel methodology for quantifying human-centric vulnerabilities in security...

5 months ago cs.CR PDF

Attack HIGH

Fingerprinting LLMs via Prompt Injection

Yuepeng Hu, Zhengyuan Jiang, Mengyuan Li +4 more

Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization, which makes it...

5 months ago cs.CR cs.CL PDF

Survey MEDIUM

Where LLM Agents Fail and How They can Learn From Failures

Kunlun Zhu, Zijia Liu, Bingxuan Li +15 more

Large Language Model (LLM) agents, which integrate planning, memory, reflection, and tool-use modules, have shown promise in solving complex,...

5 months ago cs.AI PDF

Attack LOW

Incentive-Aligned Multi-Source LLM Summaries

Yanchen Jiang, Zhe Feng, Aranyak Mehta

Large language models (LLMs) are increasingly used in modern search and answer systems to synthesize multiple, sometimes conflicting, texts into a...

5 months ago cs.CL cs.AI cs.GT PDF

Benchmark LOW

GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh +4 more

Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the...

5 months ago cs.CV cs.AI cs.LG PDF

Defense MEDIUM

A Hybrid CAPTCHA Combining Generative AI with Keystroke Dynamics for Enhanced Bot Detection

Ayda Aghaei Nia

Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHAs) are a foundational component of web security, yet traditional...

5 months ago cs.CR cs.AI PDF

Survey LOW

Who's Your Judge? On the Detectability of LLM-Generated Judgments

Dawei Li, Zhen Tan, Chengshuai Zhao +6 more

Large Language Model (LLM)-based judgments leverage powerful LLMs to efficiently evaluate candidate content and provide judgment scores. However, the...

5 months ago cs.AI PDF

Defense LOW

Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs

Akio Hayakawa, Stefan Bott, Horacio Saggion

Despite their strong performance, large language models (LLMs) face challenges in real-world application of lexical simplification (LS), particularly...

5 months ago cs.CL PDF

Tool MEDIUM

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Qianshan Wei, Tengchao Yang, Yaochen Wang +7 more

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex...

5 months ago cs.CR cs.AI PDF

Attack HIGH

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Yupei Liu, Yanting Wang, Yuqi Jia +2 more

Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses...

5 months ago cs.CR cs.AI PDF

Benchmark LOW

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

Adrian Arnaiz-Rodriguez, Miguel Baidal, Erik Derner +5 more

Large language model-powered chatbots have transformed how people seek information, especially in high-stakes contexts like mental health. Despite...

5 months ago cs.CL cs.CY PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial