Benchmark MEDIUM
Ali Naseh, Anshuman Suri, Yuefeng Peng +3 more
Generative AI leaderboards are central to evaluating model capabilities, but remain vulnerable to manipulation. Among key adversarial objectives is...
5 months ago cs.LG cs.CR
PDF
Benchmark MEDIUM
Shadi Rahimian, Mario Fritz
Single nucleotide polymorphism (SNP) datasets are fundamental to genetic studies but pose significant privacy risks when shared. The correlation of...
5 months ago cs.LG cs.CR q-bio.GN
PDF
Benchmark MEDIUM
Mary Llewellyn, Annie Gray, Josh Collyer +1 more
Before adopting a new large language model (LLM) architecture, it is critical to understand vulnerabilities accurately. Existing evaluations can be...
5 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Yongan Yu, Xianda Du, Qingchen Hu +7 more
Historical archives on weather events are collections of enduring primary source records that offer rich, untapped narratives of how societies have...
5 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Ruoxing Yang
Large language models (LLMs) such as ChatGPT have evolved into powerful and ubiquitous tools. Fine-tuning on small datasets allows LLMs to acquire...
5 months ago cs.LG cs.AI cs.CR
PDF
Benchmark MEDIUM
Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj +2 more
Large language models (LLMs) are increasingly deployed in contexts where their failures can have direct sociopolitical consequences. Yet, existing...
5 months ago cs.CL cs.AI cs.LG
PDF
Benchmark MEDIUM
Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh
Large language models (LLMs) are increasingly deployed in agentic systems where they map user intents to relevant external tools to fulfill a task. A...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Chengxiao Wang, Isha Chaudhary, Qian Hu +3 more
Large Language Models (LLMs) can produce catastrophic responses in conversational settings that pose serious risks to public safety and security....
5 months ago cs.AI cs.CR cs.LG
PDF
Benchmark MEDIUM
Hangting Ye, Jinmeng Li, He Zhao +4 more
Existing anomaly detection (AD) methods for tabular data usually rely on some assumptions about anomaly patterns, leading to inconsistent performance...
Benchmark MEDIUM
Kartik Pandit, Sourav Ganguly, Arnesh Banerjee +2 more
Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of...
5 months ago cs.LG cs.AI eess.SY
PDF
Benchmark MEDIUM
Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar +7 more
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens...
Benchmark MEDIUM
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more
While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Nikoo Naghavian, Mostafa Tavassolipour
Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...
Benchmark MEDIUM
Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more
Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...
5 months ago cs.CR cs.HC
PDF
Benchmark MEDIUM
Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more
Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...
5 months ago cs.LG cs.AI
PDF
Benchmark MEDIUM
Luoxi Tang, Yuqiao Meng, Ankita Patra +3 more
Large Language Models (LLMs) are intensively used to assist security analysts in counteracting the rapid exploitation of cyber threats, wherein LLMs...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Luca Cotti, Idilio Drago, Anisa Rula +2 more
System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of...
Benchmark MEDIUM
Yicheng Lang, Yihua Zhang, Chongyu Fan +3 more
Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model while preserving...
Benchmark MEDIUM
Andrew Gan, Zahra Ghodsi
Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The...
Benchmark MEDIUM
Ehsan Aghaei, Sarthak Jain, Prashanth Arun +1 more
Effective analysis of cybersecurity and threat intelligence data demands language models that can interpret specialized terminology, complex document...
5 months ago cs.CR cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial