Attack MEDIUM
Ali Raza, Gurang Gupta, Nikolay Matyunin +1 more
Warning: This article includes red-teaming experiments, which contain examples of compromised LLM responses that may be offensive or upsetting. Large...
2 weeks ago cs.CR cs.AI cs.LG
PDF
Attack MEDIUM
Shaswata Mitra, Raj Patel, Sudip Mittal +2 more
Multi-agent systems (MAS) powered by LLMs promise adaptive, reasoning-driven enterprise workflows, yet granting agents autonomous control over tools,...
2 weeks ago cs.CR cs.MA cs.SE
PDF
Attack MEDIUM
Alexander Erlei, Lukas Meub
As AI agents increasingly act on behalf of human stakeholders in economic settings, understanding their behavior in complex market environments...
Attack MEDIUM
Eduard Hirsch, Kristina Raab, Tobias J. Bauer +1 more
IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities....
2 weeks ago cs.CR cs.IR
PDF
Attack MEDIUM
Donghwa Kang, Hojun Choe, Doohyun Kim +2 more
Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN...
Attack MEDIUM
Anatoly Belikov, Ilya Fedotov
Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...
2 weeks ago cs.CR cs.LG
PDF
Attack MEDIUM
Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent...
3 weeks ago cs.LG cs.AI
PDF
Attack MEDIUM
Yizhe Xie, Congcong Zhu, Xinyue Zhang +5 more
Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative...
3 weeks ago cs.MA cs.AI
PDF
Attack MEDIUM
Achyutha Menon, Magnus Saebo, Tyler Crosse +3 more
The accelerating adoption of language models (LMs) as agents for deployment in long-context tasks motivates a thorough understanding of goal drift:...
Attack MEDIUM
Edouard Lansiaux
Federated Learning (FL) enables collaborative training of medical AI models across hospitals without centralizing patient data. However, the exchange...
3 weeks ago cs.CR cs.AI
PDF
Attack MEDIUM
Shuyi Zhou, Zeen Song, Wenwen Qiang +4 more
Large Language Models remain vulnerable to adversarial prefix attacks (e.g., ``Sure, here is'') despite robust standard safety. We diagnose this...
Attack MEDIUM
Guoxin Shi, Haoyu Wang, Zaihui Yang +2 more
Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on...
3 weeks ago cs.CR cs.AI
PDF
Attack MEDIUM
Martin Odersky, Yaoyu Zhao, Yichen Xu +2 more
AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause...
3 weeks ago cs.AI cs.PL
PDF
Attack MEDIUM
Jingyuan Xie, Wenjie Wang, Ji Wu +1 more
Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly...
3 weeks ago cs.CR cs.AI cs.LG
PDF
Attack MEDIUM
Qianxun Xu, Chenxi Song, Yujun Cai +1 more
Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...
Attack MEDIUM
Qianxun Xu, Chenxi Song, Yujun Cai +1 more
Recent advances in text-to-video diffusion models have enabled high-fidelity and temporally coherent videos synthesis. However, current models are...
Attack MEDIUM
Idan Habler, Vineeth Sai Narajala, Stav Koren +2 more
Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external...
3 weeks ago cs.CR cs.AI
PDF
Attack MEDIUM
Bruce W. Lee, Chen Yueh-Han, Tomek Korbak
Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior by...
3 weeks ago cs.LG cs.AI
PDF
Attack MEDIUM
Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala +4 more
While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a...
4 weeks ago cs.LG cs.AI cs.CR
PDF
Attack MEDIUM
Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy +8 more
Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval,...
4 weeks ago cs.CR cs.AI cs.CL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial