AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 421–440 of 2,031 papers

Clear filters

Defense MEDIUM

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Tianyu Chen, Dongrui Liu, Xia Hu +2 more

Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai +6 more

The Model Context Protocol (MCP) standardizes tool use for LLM-based agents and enable third-party servers. This openness introduces a security...

1 months ago cs.CR cs.CL PDF

Survey MEDIUM

Detecting LLM Hallucinations via Embedding Cluster Geometry: A Three-Type Taxonomy with Measurable Signatures

Matic Korun

We propose a geometric taxonomy of large language model hallucinations based on observable signatures in token embedding cluster structure. By...

1 months ago cs.CL PDF

Attack HIGH

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Xiaojun Jia, Jie Liao, Simeng Qin +5 more

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin

Detecting prompt injection and jailbreak attacks is critical for deploying LLM-based agents safely. As agents increasingly process untrusted data...

1 months ago cs.LG PDF

Attack MEDIUM

Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

Mario Marín Caballero, Miguel Betancourt Alonso, Daniel Díaz-López +3 more

The most valuable asset of any cloud-based organization is data, which is increasingly exposed to sophisticated cyberattacks. Until recently, the...

1 months ago cs.CR cs.AI PDF

Benchmark LOW

There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective

Edibe Yilmaz, Kahraman Kostas

The integration of large language models (LLMs) into educational processes introduces significant constraints regarding data privacy and reliability,...

1 months ago cs.CL cs.AI cs.CR PDF

Attack LOW

Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe

Somnath Banerjee

The overarching research direction of this work is the development of a ''Responsible Intelligence'' framework designed to reconcile the immense...

1 months ago cs.CL PDF

Attack HIGH

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks

Yuqi Jia, Ruiqi Wang, Xilong Wang +2 more

Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one....

1 months ago cs.CR PDF

Attack HIGH

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun +3 more

Evaluation and alignment pipelines for large language models increasingly rely on LLM-based judges, whose behavior is guided by natural-language...

1 months ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

Execution-State-Aware LLM Reasoning for Automated Proof-of-Vulnerability Generation

Haoyu Li, Xijia Che, Yanhao Wang +2 more

Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false...

1 months ago cs.SE cs.CR PDF

Attack HIGH

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin

Large language models (LLMs) remain vulnerable to jailbreak prompts that elicit harmful or policy-violating outputs, while many existing defenses...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs

Mohamed Shaaban, Mohamed Elmahallawy

Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for...

1 months ago cs.CR cs.CL PDF

Other LOW

TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

Peng Cheng, Jiucheng Zang, Qingnan Li +6 more

Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series...

1 months ago cs.LG cs.AI PDF

Attack MEDIUM

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay Culligan, Yarin Gal +4 more

As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a...

1 months ago cs.AI PDF

Benchmark MEDIUM

Backdooring Bias in Large Language Models

Anudeep Das, Prach Chantasantitam, Gurjot Singh +3 more

Large language models (LLMs) are increasingly deployed in settings where inducing a bias toward a certain topic can have significant consequences,...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

Xu Li, Simon Yu, Minzhou Pan +5 more

LLM-based agents are becoming increasingly capable, yet their safety lags behind. This creates a gap between what agents can do and should do. This...

1 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

Yiran Gao, Kim Hammar, Tao Li

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Backdoor Attacks on Contrastive Continual Learning for IoT Systems

Alfous Tim, Kuniyilh Simi D

The Internet of Things (IoT) systems increasingly depend on continual learning to adapt to non-stationary environments. These environments can...

1 months ago cs.LG cs.CR cs.NI PDF

Defense MEDIUM

GPTZero: Robust Detection of LLM-Generated Texts

George Alexandru Adam, Alexander Cui, Edwin Thomas +7 more

While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has...

1 months ago cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial