AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 269 papers

Clear filters

Defense MEDIUM

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

Harry Owiredu-Ashley

Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture...

2 weeks ago cs.CR cs.AI cs.CL PDF

Defense LOW

SCAFFOLD-CEGIS: Preventing Latent Security Degradation in LLM-Driven Iterative Code Refinement

Yi Chen, Yun Bian, Haiquan Wang +2 more

The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of...

2 weeks ago cs.CR cs.SE PDF

Defense MEDIUM

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

Bo Jiang

Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and...

2 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription

Sumit Ranjan, Sugandha Sharma, Ubaid Abbas +1 more

Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt...

2 weeks ago cs.SD cs.AI PDF

Defense MEDIUM

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

Xisen Jin, Michael Duan, Qin Lin +4 more

As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces...

2 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin +2 more

Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the...

2 weeks ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Ved Sriraman, Adam Block

Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a...

2 weeks ago cs.LG cs.AI PDF

Defense LOW

Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection

Junchuan Zhao, Minh Duc Vu, Ye Wang

Neural codec language models enable high-quality discrete speech synthesis, yet their inference remains vulnerable to token-level artifacts and...

2 weeks ago cs.SD eess.AS PDF

Defense MEDIUM

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul +1 more

The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded...

2 weeks ago cs.CL PDF

Defense MEDIUM

Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing

Zeyu Zhang, Xiangxiang Dai, Ziyi Han +2 more

Large language models (LLMs) are typically governed by post-training alignment (e.g., RLHF or DPO), which yields a largely static policy during...

3 weeks ago cs.LG cs.AI PDF

Defense LOW

Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

Brandon Yee, Krishna Sharma

MoltBook is a large-scale multi-agent coordination environment where over 770,000 autonomous LLM agents interact without human participation,...

3 weeks ago cs.MA cs.AI cs.SI PDF

Defense LOW

RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection

Sami Abuzakuk, Lucas Crijns, Anne-Marie Kermarrec +2 more

Infrastructure as code (IaC) tools automate cloud provisioning but verifying that deployed systems remain consistent with the IaC specifications...

3 weeks ago cs.SE cs.AI cs.MA PDF

Defense LOW

ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

Nancy Lau, Louis Sloot, Jyoutir Raj +6 more

Large language models (LLMs) are increasingly being deployed as software engineering agents that autonomously contribute to repositories. A major...

3 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Manisha Mukherjee, Vincent J. Hellendoorn

Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development, yet their limited transparency in...

3 weeks ago cs.SE cs.AI cs.CR PDF

Defense MEDIUM

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Ming Wen, Kun Yang, Xin Chen +4 more

Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as...

3 weeks ago cs.LG cs.AI PDF

Defense MEDIUM

TAS-GNN: A Status-Aware Signed Graph Neural Network for Anomaly Detection in Bitcoin Trust Systems

Chang Xue, Fang Liu, Jiaye Wang +2 more

Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized...

3 weeks ago cs.CR cs.AI cs.LG PDF

Defense LOW

Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation

Xingyu Zhu, Kesen Zhao, Liang Yi +4 more

Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language reasoning, yet they remain vulnerable to hallucination,...

3 weeks ago cs.CV PDF

Defense LOW

LLM-Powered Silent Bug Fuzzing in Deep Learning Libraries via Versatile and Controlled Bug Transfer

Kunpeng Zhang, Dongwei Xiao, Daoyuan Wu +5 more

Deep learning (DL) libraries are widely used in critical applications, where even subtle silent bugs can lead to serious consequences. While existing...

3 weeks ago cs.SE PDF

Defense MEDIUM

Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions

Lan Zhang, Chengsi Liang, Zeming Zhuang +4 more

Semantic communication (SemCom) redefines wireless communication from reproducing symbols to transmitting task-relevant semantics. However, this...

4 weeks ago cs.CR eess.SY PDF

Defense MEDIUM

MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection

Xuan Chen, Hao Liu, Tao Yuan +3 more

Traditional phishing website detection relies on static heuristics or reference lists, which lag behind rapidly evolving attacks. While recent...

4 weeks ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial