AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 101–120 of 715 papers

Clear filters

Attack HIGH

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Yige Liu, Yiwei Lou, Che Wang +2 more

As a distributed collaborative machine learning paradigm, vertical federated learning (VFL) allows multiple passive parties with distinct features...

4 weeks ago cs.LG cs.CR PDF

Attack HIGH

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi +1 more

LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM...

1 months ago cs.CR cs.LG PDF

Tool HIGH

Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Ian Steenstra, Paola Pedrelli, Weiyan Shi +2 more

Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the...

1 months ago cs.CL cs.AI cs.CY PDF

Attack HIGH

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil, Malak Fares, Ayellet Tal

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses....

1 months ago cs.CV PDF

Attack HIGH

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

Xiaochong Jiang, Shiqi Yang, Wenting Yang +2 more

Agentic systems built on large language models (LLMs) extend beyond text generation to autonomously retrieve information and invoke tools. This...

1 months ago cs.CR cs.AI PDF

Tool HIGH

Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Xingyu Shen, Tommy Duong, Xiaodong An +6 more

Age estimation systems are increasingly deployed as gatekeepers for age-restricted online content, yet their robustness to cosmetic modifications has...

1 months ago cs.CV cs.CR cs.LG PDF

Survey HIGH

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Kunal Mukherjee

Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system,...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Hiding in Plain Text: Detecting Concealed Jailbreaks via Activation Disentanglement

Amirhossein Farzam, Majid Behabahani, Mani Malek +2 more

Large language models (LLMs) remain vulnerable to jailbreak prompts that are fluent and semantically coherent, and therefore difficult to detect with...

1 months ago cs.AI PDF

Attack HIGH

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

Prompt Injection as Role Confusion

Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

Sieun Kim, Yeeun Jo, Sungmin Na +5 more

Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying...

1 months ago cs.HC PDF

Attack HIGH

When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

Shenyang Chen, Liuwan Zhu

Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this...

1 months ago cs.CR cs.AI PDF

Attack HIGH

When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models

Zafir Shamsi, Nikhil Chekuru, Zachary Guzman +1 more

Large Language Models (LLMs) are increasingly integrated into high-stakes applications, making robust safety guarantees a central practical and...

1 months ago cs.CL cs.AI PDF

Benchmark HIGH

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Mirae Kim, Seonghun Jeong, Youngjun Kwak

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly...

1 months ago cs.CL cs.AI cs.DB PDF

Tool HIGH

PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks

Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan +3 more

The increasing deployment of Federated Learning (FL) in Intrusion Detection Systems (IDS) introduces new challenges related to data privacy,...

1 months ago cs.CR cs.AI PDF

Attack HIGH

TFL: Targeted Bit-Flip Attack on Large Language Model

Jingkai Guo, Chaitali Chakrabarti, Deliang Fan

Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model...

1 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth

As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the...

1 months ago cs.CR cs.AI PDF

Attack HIGH

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng, Jiaqing Wu, Miao Chen +3 more

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution...

1 months ago cs.AI cs.LG PDF

Benchmark HIGH

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri

Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities...

1 months ago cs.AI cs.CL PDF

Attack HIGH

Sequential Membership Inference Attacks

Thomas Michel, Debabrota Basu, Emilie Kaufmann

Modern AI models are not static. They go through multiple updates in their lifecycles. Thus, exploiting the model dynamics to create stronger...

1 months ago cs.LG cs.CR math.ST PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial