AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 721–740 of 968 papers

Clear filters

Defense MEDIUM

EASE: Practical and Efficient Safety Alignment for Small Language Models

Haonan Shi, Guoli Wang, Tu Ouyang +1 more

Small language models (SLMs) are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. Current shallow...

4 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Efficient LLM Safety Evaluation through Multi-Agent Debate

Dachuan Lin, Guobin Shen, Zihao Yang +3 more

Safety evaluation of large language models (LLMs) increasingly relies on LLM-as-a-judge pipelines, but strong judges can still be expensive to use at...

4 months ago cs.AI cs.CR PDF

Attack MEDIUM

Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution Fingerprinting

Dilli Prasad Sharma, Liang Xue, Xiaowei Sun +2 more

The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven...

4 months ago cs.CR cs.AI cs.CL PDF

Tool MEDIUM

Can LLM Infer Risk Information From MCP Server System Logs?

Jiayi Fu, Yuansen Zhang, Yinggui Wang

Large Language Models (LLMs) demonstrate strong capabilities in solving complex tasks when integrated with external tools. The Model Context Protocol...

4 months ago cs.CR cs.CL PDF

Attack MEDIUM

CGCE: Classifier-Guided Concept Erasure in Generative Models

Viet Nguyen, Vishal M. Patel

Recent advancements in large-scale generative models have enabled the creation of high-quality images and videos, but have also raised significant...

4 months ago cs.CV cs.AI cs.CR PDF

Benchmark MEDIUM

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Amr Gomaa, Ahmed Salem, Sahar Abdelnabi

As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a...

4 months ago cs.CR cs.CL cs.CY PDF

Benchmark MEDIUM

TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems

Ishan Kavathekar, Hemang Jain, Ameya Rathod +2 more

Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities,...

4 months ago cs.MA cs.AI PDF

Benchmark MEDIUM

Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen +3 more

Unlearning in large language models (LLMs) is critical for regulatory compliance and for building ethical generative AI systems that avoid producing...

4 months ago cs.LG PDF

Benchmark MEDIUM

From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting

Cyril Vallez, Alexander Sternfeld, Andrei Kucharavy +1 more

As the role of Large Language Models (LLM)-based coding assistants in software development becomes more critical, so does the role of the bugs they...

4 months ago cs.CL PDF

Attack MEDIUM

Large Language Models for Cyber Security

Raunak Somani, Aswani Kumar Cherukuri

This paper studies the integration off Large Language Models into cybersecurity tools and protocols. The main issue discussed in this paper is how...

4 months ago cs.CR PDF

Attack MEDIUM

Adversarially Robust and Interpretable Magecart Malware Detection

Pedro Pereira, José Gouveia, João Vitorino +2 more

Magecart skimming attacks have emerged as a significant threat to client-side security and user trust in online payment systems. This paper addresses...

4 months ago cs.CR PDF

Tool MEDIUM

AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

Tim Beyer, Jonas Dornbusch, Jakob Steimle +3 more

The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of...

4 months ago cs.AI cs.SE PDF

Defense MEDIUM

Explaining Software Vulnerabilities with Large Language Models

Oshando Johnson, Alexandra Fomina, Ranjith Krishnamurthy +3 more

The prevalence of security vulnerabilities has prompted companies to adopt static application security testing (SAST) tools for vulnerability...

4 months ago cs.SE cs.AI PDF

Other MEDIUM

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

Hirohane Takagi, Gouki Minegishi, Shota Kizawa +2 more

Although behavioral studies have documented numerical reasoning errors in large language models (LLMs), the underlying representational mechanisms...

4 months ago cs.AI PDF

Benchmark MEDIUM

Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

Shiyin Lin

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness,...

4 months ago cs.CR cs.AI PDF

Defense MEDIUM

STARS: Synchronous Token Alignment for Robust Supervision in Large Language Models

Mohammad Atif Quamar, Mohammad Areeb, Mikhail Kuznetsov +2 more

Aligning large language models (LLMs) with human values is crucial for safe deployment. Inference-time techniques offer granular control over...

4 months ago cs.CL PDF

Attack MEDIUM

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

Botao 'Amber' Hu, Helena Rong

As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human...

4 months ago cs.HC cs.AI cs.MA PDF

Benchmark MEDIUM

Evaluating Control Protocols for Untrusted AI Agents

Jon Kutasov, Chloe Loughridge, Yuqi Sun +4 more

As AI systems become more capable and widely deployed as agents, ensuring their safe operation becomes critical. AI control offers one approach to...

4 months ago cs.AI PDF

Attack MEDIUM

Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models

W. K. M Mithsara, Ning Yang, Ahmed Imteaj +2 more

The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and...

4 months ago cs.LG cs.CR PDF

Attack MEDIUM

Verifying LLM Inference to Detect Model Weight Exfiltration

Roy Rinberg, Adam Karvonen, Alexander Hoover +2 more

As large AI models become increasingly valuable assets, the risk of model weight exfiltration from inference servers grows accordingly. An attacker...

4 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial