Benchmark MEDIUM
André V. Duarte, Xuying li, Bin Zeng +3 more
If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling...
Benchmark LOW
Emily Herron, Junqi Yin, Feiyi Wang
Large language models (LLMs) have demonstrated transformative potential in scientific research, yet their deployment in high-stakes contexts raises...
Benchmark MEDIUM
Simon Yu, Peilin Yu, Hongbo Zheng +3 more
We present VISAT, a novel open dataset and benchmarking suite for evaluating model robustness in the task of traffic sign recognition with the...
4 months ago cs.CR cs.AI cs.LG
PDF
Benchmark LOW
He Hu, Chiyuan Ma, Qianning Wang +5 more
The shortage of mental health professionals has driven the web to become a primary avenue for accessible psychological support. While Large Language...
Benchmark MEDIUM
Zheng Zhang, Guanlong Wu, Sen Deng +2 more
In the rapidly expanding landscape of Large Language Model (LLM) applications, real-time output streaming has become the dominant interaction...
Benchmark MEDIUM
Juan Ren, Mark Dras, Usman Naseem
Agentic methods have emerged as a powerful and autonomous paradigm that enhances reasoning, collaboration, and adaptive control, enabling systems to...
Benchmark MEDIUM
Yifan Wu, Xuewei Feng, Yuxiang Yang +1 more
As the core of the Internet infrastructure, the TCP/IP protocol stack undertakes the task of network data transmission. However, due to the...
4 months ago cs.CR cs.NI
PDF
Benchmark MEDIUM
María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone +3 more
Cybersecurity spans multiple interconnected domains, complicating the development of meaningful, labor-relevant benchmarks. Existing benchmarks...
Benchmark MEDIUM
Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov +2 more
As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that...
4 months ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Hiromu Takahashi, Shotaro Ishihara
We propose Fast-MIA (https://github.com/Nikkei/fast-mia), a Python library for efficiently evaluating membership inference attacks (MIA) against...
4 months ago cs.CR cs.CL
PDF
Benchmark MEDIUM
Armin Gerami, Kazem Faghih, Ramani Duraiswami
Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing...
5 months ago cs.IR cs.AI cs.CL
PDF
Benchmark MEDIUM
Julia Bazinska, Max Mathys, Francesco Casucci +4 more
AI agents powered by large language models (LLMs) are being deployed at scale, yet we lack a systematic understanding of how the choice of backbone...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Hao Zheng, Zirui Pang, Ling li +5 more
Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data privacy, making Machine Unlearning (MU), the selective removal of...
5 months ago cs.AI cs.CL
PDF
Benchmark LOW
Wenxuan Bao, Ruxi Deng, Jingrui He
Pretrained vision-language models such as CLIP achieve strong zero-shot generalization but remain vulnerable to distribution shifts caused by input...
5 months ago cs.CV cs.LG
PDF
Benchmark MEDIUM
Mojtaba Eshghie, Gabriele Morello, Matteo Lauretano +2 more
Smart contract vulnerabilities cost billions of dollars annually, yet existing automated analysis tools fail to generate deployable defenses. We...
5 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Christoph Bühler, Matteo Biagiola, Luca Di Grazia +1 more
Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model...
5 months ago cs.CR cs.AI cs.SE
PDF
Benchmark MEDIUM
Divyanshu Kumar, Nitin Aravind Birur, Tanay Baswa +2 more
Frontier Large Language Models (LLMs) pose unprecedented dual-use risks through the potential proliferation of chemical, biological, radiological,...
5 months ago cs.CR cs.AI
PDF
Benchmark LOW
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith +1 more
AI-based sensing at wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications, particularly for...
5 months ago cs.IT cs.CR cs.LG
PDF
Benchmark LOW
Zhenghao Xu, Qin Lu, Qingru Zhang +9 more
Reward model (RM) plays a pivotal role in reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs). However,...
Benchmark HIGH
Euodia Dodd, Nataša Krčo, Igor Shilov +1 more
Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art...
5 months ago cs.LG cs.CR
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial