Auto-Tuning Safety Guardrails for Black-Box Large Language Models
Perry Abdulkadir
Large language models (LLMs) are increasingly deployed behind safety guardrails such as system prompts and content filters, especially in settings...
2,077+ academic papers on AI security, attacks, and defenses
Showing 101–120 of 179 papers
Clear filtersPerry Abdulkadir
Large language models (LLMs) are increasingly deployed behind safety guardrails such as system prompts and content filters, especially in settings...
Dang-Khoa Nguyen, Gia-Thang Ho, Quang-Minh Pham +5 more
Software supply chain attacks targeting the npm ecosystem have become increasingly sophisticated, leveraging obfuscation and complex logic to evade...
Andrew Adiletta, Kathryn Adiletta, Kemal Derya +1 more
The rapid deployment of Large Language Models (LLMs) has created an urgent need for enhanced security and privacy measures in Machine Learning (ML)....
Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran +2 more
Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities....
Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao +3 more
Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language...
Sohely Jahan, Ruimin Sun
As medical large language models (LLMs) become increasingly integrated into clinical workflows, concerns around alignment robustness, and safety are...
Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong
Underground mining operations depend on sensor networks to monitor critical parameters such as temperature, gas concentration, and miner movement,...
Wenjie Zhang, Yun Lin, Chun Fung Amos Kwok +5 more
Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing...
Xiaoqi Li, Hailu Kuang, Wenkai Li +2 more
Traditional approaches for smart contract analysis often rely on intermediate representations such as abstract syntax trees, control-flow graphs, or...
Jehyeok Yeon, Federico Cinus, Yifan Wu +1 more
Large language models (LLMs) face critical safety challenges, as they can be manipulated to generate harmful content through adversarial prompts and...
Sheng Liu, Panos Papadimitratos
Federated Learning (FL) has drawn the attention of the Intelligent Transportation Systems (ITS) community. FL can train various models for ITS tasks,...
Jason Vega, Gagandeep Singh
A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by...
Jiale Zhao, Xing Mou, Jinlin Wu +7 more
Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their...
Biagio Montaruli, Luca Compagna, Serena Elisa Ponta +1 more
The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical...
Weiwei Wang
Catastrophic forgetting remains a fundamental challenge in continual learning for large language models. Recent work revealed that performance...
Rongzhe Wei, Peizhi Niu, Xinjie Shen +7 more
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Existing approaches...
Henry Onyeka, Emmanuel Samson, Liang Hong +3 more
The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated...
Neemesh Yadav, Francesco Ortu, Jiarui Liu +5 more
Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a...
Junbo Zhang, Ran Chen, Qianli Zhou +2 more
Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety...
Onat Gungor, Roshan Sood, Jiasheng Zhou +1 more
Large Language Models (LLMs) are highly effective for cybersecurity question answering (QA) but are difficult to deploy on edge devices due to their...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial