AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 121–140 of 259 papers

Clear filters

Attack MEDIUM

Rectifying Adversarial Examples Using Their Vulnerabilities

Fumiya Morimoto, Ryuto Morita, Satoshi Ono

Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data...

2 months ago cs.CR cs.LG cs.NE PDF

Attack MEDIUM

The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition

Xiaoze Liu, Weichen Yu, Matt Fredrikson +2 more

The open-weight language model ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and...

2 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

Pankayaraj Pathmanathan, Michael-Andrei Panaitescu-Liess, Cho-Yu Jason Chiang +1 more

Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to enhance large language models (LLMs) with external knowledge, reducing...

2 months ago cs.IR PDF

Attack MEDIUM

RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress

Ruixuan Huang, Qingyue Wang, Hantao Huang +4 more

Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To...

2 months ago cs.CR cs.LG PDF

Attack MEDIUM

Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against

Tsogt-Ochir Enkhbayar

Warning-framed content in training data (e.g., "DO NOT USE - this code is vulnerable") does not, it turns out, teach language models to avoid the...

3 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation

Tian Li, Bo Lin, Shangwen Wang +1 more

Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security...

3 months ago cs.CR cs.SE PDF

Attack MEDIUM

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Haoyang Li, Mingjin Li, Jinxin Zuo +5 more

LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced...

3 months ago cs.CR cs.MA PDF

Attack MEDIUM

Beyond Context: Large Language Models Failure to Grasp Users Intent

Ahmed M. Hussain, Salahuddin Salahuddin, Panos Papadimitratos

Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability...

3 months ago cs.AI cs.CL cs.CR PDF

Attack MEDIUM

The Imitation Game: Using Large Language Models as Chatbots to Combat Chat-Based Cybercrimes

Yifan Yao, Baojuan Wang, Jinhao Duan +4 more

Chat-based cybercrime has emerged as a pervasive threat, with attackers leveraging real-time messaging platforms to conduct scams that rely on...

3 months ago cs.CR PDF

Attack MEDIUM

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

Honglin Mu, Jinghao Liu, Kaiyang Wan +4 more

Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content...

3 months ago cs.CL cs.AI PDF

Attack MEDIUM

IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense

Rahul Yumlembam, Biju Issac, Seibu Mary Jacob +1 more

Since the Internet of Things (IoT) is widely adopted using Android applications, detecting malicious Android apps is essential. In recent years,...

3 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress

Samruddhi Baviskar

Machine learning models used in financial decision systems operate in nonstationary economic environments, yet adversarial robustness is typically...

3 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

SafeMed-R1: Adversarial Reinforcement Learning for Generalizable and Robust Medical Reasoning in Vision-Language Models

A. A. Gde Yogi Pramana, Jason Ray, Anthony Jaya +1 more

Vision--Language Models (VLMs) show significant promise for Medical Visual Question Answering (VQA), yet their deployment in clinical settings is...

3 months ago cs.AI PDF

Attack MEDIUM

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens

Tung-Ling Li, Yuhao Wu, Hongliang Liu

Reward models and LLM-as-a-Judge systems are central to modern post-training pipelines such as RLHF, DPO, and RLAIF, where they provide scalar...

3 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach

Yidong Chai, Yi Liu, Mohammadreza Ebrahimi +2 more

Social media platforms are plagued by harmful content such as hate speech, misinformation, and extremist rhetoric. Machine learning (ML) models are...

3 months ago cs.LG PDF

Attack MEDIUM

In-Context Probing for Membership Inference in Fine-Tuned Language Models

Zhexi Lu, Hongliang Chi, Nathalie Baracaldo +3 more

Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs), especially when models are adapted to...

3 months ago cs.CR cs.LG PDF

Attack MEDIUM

ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I

Seok-Hyun Ga, Chun-Yen Chang

The rapid development of Generative AI is bringing innovative changes to education and assessment. As the prevalence of students utilizing AI for...

3 months ago cs.AI cs.CL cs.CY PDF

Attack MEDIUM

From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda

Piercosma Bisconti, Marcello Galisai, Matteo Prandi +6 more

Safety mechanisms in LLMs remain vulnerable to attacks that reframe harmful requests through culturally coded structures. We introduce Adversarial...

3 months ago cs.CL cs.AI cs.CY PDF

Attack MEDIUM

Practical challenges of control monitoring in frontier AI deployments

David Lindner, Charlie Griffin, Tomek Korbak +4 more

Automated control monitors could play an important role in overseeing highly capable AI agents that we do not fully trust. Prior work has explored...

3 months ago cs.CR cs.AI cs.MA PDF

Attack MEDIUM

Adversarial Robustness in Financial Machine Learning: Defenses, Economic Impact, and Governance Evidence

Samruddhi Baviskar

We evaluate adversarial robustness in tabular machine learning models used in financial decision making. Using credit scoring and fraud detection...

3 months ago cs.LG cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial