Benchmark MEDIUM
Chengcan Wu, Zhixin Zhang, Mingqian Xu +2 more
Large Language Model (LLM)-based Multi-Agent Systems (MAS) have become a popular paradigm of AI applications. However, trustworthiness issues in MAS...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark LOW
Joydeep Chandra, Satyam Kumar Navneet
Domestic AI agents faces ethical, autonomy, and inclusion challenges, particularly for overlooked groups like children, elderly, and Neurodivergent...
5 months ago cs.HC cs.AI cs.LG
PDF
Benchmark LOW
Sophia Xiao Pu, Sitao Cheng, Xin Eric Wang +1 more
Oversensitivity occurs when language models defensively reject prompts that are actually benign. This behavior not only disrupts user interactions...
Benchmark MEDIUM
Alexander Nemecek, Zebin Yun, Zahra Rahmani +4 more
As large language models (LLMs) become progressively more embedded in clinical decision-support, documentation, and patient-information systems,...
5 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Marco Alecci, Jordan Samhi, Tegawendé F. Bissyandé +1 more
Mobile apps often embed authentication secrets, such as API keys, tokens, and client IDs, to integrate with cloud services. However, developers often...
5 months ago cs.CR cs.SE
PDF
Benchmark MEDIUM
Giovanni De Muri, Mark Vero, Robin Staab +1 more
LLMs are often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models....
5 months ago cs.LG cs.AI cs.CR
PDF
Benchmark HIGH
Osama Al Haddad, Muhammad Ikram, Ejaz Ahmed +1 more
Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models (LLMs) offer a potential aid by...
Benchmark LOW
Yasser Hamidullah, Koel Dutta Chowdhury, Yusser Al Ghussin +4 more
Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly...
Benchmark MEDIUM
Yixuan Liu, Xinlei Li, Yi Li
Phishing attacks in Web3 ecosystems are increasingly sophisticated, exploiting deceptive contract logic, malicious frontend scripts, and token...
Benchmark LOW
Lei Li, Xiao Zhou, Yingying Zhang +1 more
Medical question answering (QA) requires extensive access to domain-specific knowledge. A promising direction is to enhance large language models...
5 months ago cs.CL cs.AI
PDF
Benchmark LOW
Jiahao Shi, Tianyi Zhang
Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to...
5 months ago cs.CR cs.LG cs.SE
PDF
Benchmark HIGH
Pranshav Gajjar, Molham Khoja, Abiodun Ganiyu +4 more
The impending adoption of Open Radio Access Network (O-RAN) is fueling innovation in the RAN towards data-driven operation. Unlike traditional RAN...
5 months ago cs.CR cs.NI
PDF
Benchmark HIGH
Chengquan Guo, Yuzhou Nie, Chulin Xie +3 more
As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research...
Benchmark LOW
Navreet Kaur, Hoda Ayad, Hayoung Jung +3 more
Language model users often embed personal and social context in their questions. The asker's role -- implicit in how the question is framed --...
5 months ago cs.CL cs.AI cs.CY
PDF
Benchmark MEDIUM
Shivam Ratnakar, Sanjay Raghavendra
Integration of Large Language Models with search/retrieval engines has become ubiquitous, yet these systems harbor a critical vulnerability that...
5 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
David Peer, Sebastian Stabinger
Large Language Models (LLMs) have demonstrated impressive capabilities, yet their deployment in high-stakes domains is hindered by inherent...
5 months ago cs.CL cs.AI
PDF
Benchmark MEDIUM
Shuai Li, Kejiang Chen, Jun Jiang +5 more
Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources,...
Benchmark LOW
Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar +2 more
The remarkable capabilities of Large Language Models (LLMs) in natural language understanding and generation have sparked interest in their potential...
5 months ago cs.CR cs.AI cs.LG
PDF
Benchmark LOW
Yao Huang, Yitong Sun, Yichi Zhang +3 more
Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also...
5 months ago cs.CL cs.AI cs.LG
PDF
Benchmark LOW
Luca Belli, Kate Bentley, Will Alexander +5 more
We introduce VERA-MH (Validation of Ethical and Responsible AI in Mental Health), an automated evaluation of the safety of AI chatbots used in mental...
5 months ago cs.CY cs.AI cs.HC
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial