Bias Injection Attacks on RAG Databases and Sanitization Defenses
defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects
Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation
with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt
SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius
Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability
Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems
user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this
SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions
Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial
STELP: Secure Transpilation and Execution of LLM-Generated Programs
Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them at the center
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image
Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions
SemCom via AI defense. We analyze AI-centric threat models by consolidating existing studies and organizing attack surfaces across model-level, channel-realizable, knowledge-based, and networked inference vectors. Building
Bypassing Prompt Injection Detectors through Evasive Injections
Large language models (LLMs) are increasingly used in interactive and retrieval-augmented systems, but they remain vulnerable to task drift; deviations from a user's intended instruction due to injected
Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper conducts the first systematic exploration
How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation
Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by students who may employ adversarial prompting
Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the "USB-C for Agentic
Backdoor Attacks Against Speech Language Models
resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness
Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs
Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backdoors under user-driven
Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning
parties with distinct features and one active party with labels to collaboratively train a model. Although it is known for the privacy-preserving capabilities, VFL still faces significant privacy
AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly
detection mechanisms and harden manual analysis. Adversaries typically rely on obfuscation, anti-repacking, steganography, poisoning, and evasion techniques to AI-based tools, and in-memory execution to conceal malicious functionality
Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks. However, their tightly coupled instant-messaging interaction paradigm and high-privilege execution
Quantifying Document Impact in RAG-LLMs
Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this introduces challenges such as factual inconsistencies, source