AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 841–860 of 986 papers

Clear filters

Benchmark MEDIUM

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs

Juan Ren, Mark Dras, Usman Naseem

Large Vision-Language Models (LVLMs) unlock powerful multimodal reasoning but also expand the attack surface, particularly through adversarial inputs...

5 months ago cs.CL PDF

Benchmark MEDIUM

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

João A. Leite, Arnav Arora, Silvia Gargova +5 more

Large Language Models (LLMs) can generate human-like disinformation, yet their ability to personalise such content across languages and demographics...

5 months ago cs.CL PDF

Defense MEDIUM

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers

Ruben Belo, Marta Guimaraes, Claudia Soares

Large Language Models are susceptible to jailbreak attacks that bypass built-in safety guardrails (e.g., by tricking the model with adversarial...

5 months ago cs.LG PDF

Defense MEDIUM

StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis

Siyuan Li, Aodu Wulianghai, Xi Lin +4 more

With the increasing integration of large language models (LLMs) into open-domain writing, detecting machine-generated text has become a critical task...

5 months ago cs.CL cs.AI PDF

Attack MEDIUM

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà

Over the last decade, machine learning has been extensively applied to identify malicious Android applications. However, such approaches remain...

5 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs

Blazej Manczak, Eric Lin, Francisco Eiras +2 more

Large language models (LLMs) are rapidly transitioning into medical clinical use, yet their reliability under realistic, multi-turn interactions...

5 months ago cs.CL cs.AI PDF

Defense MEDIUM

SafeMT: Multi-turn Safety for Multimodal Language Models

Han Zhu, Juntao Dai, Jiaming Ji +8 more

With the widespread use of multi-modal Large Language models (MLLMs), safety issues have become a growing concern. Multi-turn dialogues, which are...

5 months ago cs.CL cs.AI PDF

Survey MEDIUM

Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach

Zhenyu Mao, Jacky Keung, Fengji Zhang +3 more

The increasing demand for software development has driven interest in automating software engineering (SE) tasks using Large Language Models (LLMs)....

5 months ago cs.SE PDF

Benchmark MEDIUM

Locket: Robust Feature-Locking Technique for Language Models

Lipeng He, Vasisht Duddu, N. Asokan

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models...

5 months ago cs.CR cs.LG PDF

Attack MEDIUM

Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing

Deeksha Hareesha Kulal, Chidozie Princewill Arannonu, Afsah Anwar +2 more

Phishing remains a critical cybersecurity threat, especially with the advent of large language models (LLMs) capable of generating highly convincing...

5 months ago cs.CR PDF

Benchmark MEDIUM

Deep Research Brings Deeper Harm

Shuo Chen, Zonggen Li, Zhen Han +7 more

Deep Research (DR) agents built on Large Language Models (LLMs) can perform complex, multi-step research by decomposing tasks, retrieving online...

5 months ago cs.CR cs.CL PDF

Benchmark MEDIUM

Countermind: A Multi-Layered Security Architecture for Large Language Models

Dominik Schwarz

The security of Large Language Model (LLM) applications is fundamentally challenged by "form-first" attacks like prompt injection and jailbreaking,...

5 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Don't Walk the Line: Boundary Guidance for Filtered Generation

Sarah Ball, Andreas Haupt

Generative models are increasingly paired with safety classifiers that filter harmful or undesirable outputs. A common strategy is to fine-tune the...

5 months ago cs.LG cs.CL PDF

Benchmark MEDIUM

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

Jiayu Ding, Lei Cui, Li Dong +2 more

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex...

5 months ago cs.CL PDF

Attack MEDIUM

Living Off the LLM: How LLMs Will Change Adversary Tactics

Sean Oesch, Jack Hutchins, Luke Koch +1 more

In living off the land attacks, malicious actors use legitimate tools and processes already present on a system to avoid detection. In this paper, we...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Large Language Models Are Effective Code Watermarkers

Rui Xu, Jiawei Chen, Zhaoxia Yin +2 more

The widespread use of large language models (LLMs) and open-source code has raised ethical and security concerns regarding the distribution and...

5 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu, Bonan Ruan, Xianglin Yang +5 more

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of...

5 months ago cs.CR PDF

Tool MEDIUM

TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code

Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic

Large language Models (LLMs) have shown remarkable proficiency in code generation tasks across various programming languages. However, their outputs...

5 months ago cs.CL cs.CR PDF

Defense MEDIUM

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

Zhuochen Yang, Kar Wai Fok, Vrizlynn L. L. Thing

Large language models have gained widespread attention recently, but their potential security vulnerabilities, especially privacy leakage, are also...

5 months ago cs.CR PDF

Tool MEDIUM

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

Qizhou Peng, Yang Zheng, Yu Wen +2 more

Reinforcement learning (RL) has been an important machine learning paradigm for solving long-horizon sequential decision-making problems under...

5 months ago cs.LG cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial