Defense MEDIUM
Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry +4 more
Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain...
1 months ago cs.AI cs.CL
PDF
Attack HIGH
Takashi Koide, Hiroki Nakano, Daiki Chiba
Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered...
Attack HIGH
Yao Zhou, Zeen Song, Wenwen Qiang +4 more
Safety alignment mechanisms in Large Language Models (LLMs) often operate as latent internal states, obscuring the model's inherent capabilities....
Benchmark LOW
Nelu D. Radpour
Contemporary benchmarks for agentic artificial intelligence (AI) frequently evaluate safety through isolated task-level accuracy thresholds,...
1 months ago cs.CY cs.AI cs.HC
PDF
Attack HIGH
Zihan Wang, Hongwei Li, Rui Zhang +2 more
Chat template is a common technique used in the training and inference stages of Large Language Models (LLMs). It can transform input and output data...
Defense MEDIUM
Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security...
1 months ago cs.CR cs.AI
PDF
Attack HIGH
Ziyou Jiang, Lin Shi, Guowei Yang +3 more
Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to...
Tool MEDIUM
Guangwei Zhang, Jianing Zhu, Cheng Qian +12 more
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM...
Attack HIGH
Yunbei Zhang, Yingqiang Ge, Weijie Xu +3 more
Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally...
1 months ago cs.CR cs.CV cs.LG
PDF
Attack HIGH
Ethan Rathbun, Wo Wei Lin, Alina Oprea +1 more
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making...
1 months ago cs.CR cs.LG cs.RO
PDF
Attack HIGH
Jafar Isbarov, Murat Kantarcioglu
As AI agents automate critical workloads, they remain vulnerable to indirect prompt injection (IPI) attacks. Current defenses rely on monitoring...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Ruixin Yang, Ethan Mendes, Arthur Wang +4 more
Vision-language models (VLMs) have demonstrated strong performance in image geolocation, a capability further sharpened by frontier multimodal large...
1 months ago cs.CR cs.AI
PDF
Attack MEDIUM
Vishruti Kakkad, Paul Chung, Hanan Hibshi +1 more
An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as...
1 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Casey Ford, Madison Van Doren, Emily Dix
Multimodal large language models (MLLMs) are increasingly deployed in real-world systems, yet their safety under adversarial prompting remains...
1 months ago cs.CL cs.AI cs.HC
PDF
Benchmark LOW
Mengru Wang, Zhenqian Xu, Junfeng Fang +4 more
Large Language Models (LLMs) can acquire unintended biases from seemingly benign training data even without explicit cues or malicious content....
1 months ago cs.LG cs.AI cs.CL
PDF
Attack MEDIUM
Yike Sun, Haotong Yang, Zhouchen Lin +1 more
Tokenization is fundamental to how language models represent and process text, yet the behavior of widely used BPE tokenizers has received far less...
Attack MEDIUM
Ariel Fogel, Omer Hofman, Eilon Cohen +1 more
Open-weight language models are increasingly used in production settings, raising new security challenges. One prominent threat in this context is...
1 months ago cs.CR cs.LG
PDF
Attack MEDIUM
Leo Schwinn, Moritz Ladenburger, Tim Beyer +3 more
Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For...
1 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Debargha Ganguly, Sreehari Sankar, Biyao Zhang +8 more
Current approaches to LLM safety fundamentally rely on a brittle cat-and-mouse game of identifying and blocking known threats via guardrails. We...
1 months ago cs.CL cs.AI cs.DC
PDF
Defense MEDIUM
Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more
Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...
1 months ago cs.LG cs.AI cs.CR
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial