Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou +4 more
AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a...
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a...
In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying...
Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like...
Milad Nasr, Nicholas Carlini, Chawin Sitawarin +11 more
How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an...
Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and...
Recent advances in large language models have enabled developers to generate software by conversing with artificial intelligence systems rather than...
Vision-Language Models (VLMs) have garnered significant attention for their remarkable ability to interpret and generate multimodal content. However,...
The safety of Large Language Models (LLMs) is crucial for the development of trustworthy AI applications. Existing red teaming methods often rely on...
Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date...