Benchmark HIGH
Junhyeok Lee, Han Jang, Kyu Sung Choi
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt...
1 months ago cs.CL cs.LG
PDF
Benchmark HIGH
Hao Li, Ruoyao Wen, Shanghao Shi +2 more
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external...
Benchmark HIGH
Yunpeng Xiong, Ting Zhang
Static Application Security Testing (SAST) tools are essential for identifying software vulnerabilities, but they often produce a high volume of...
Benchmark HIGH
Ivan K. Tung, Yu Xiang Shi, Alex Chien +2 more
Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit...
1 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Miao Lin, Feng Yu, Rui Ning +6 more
Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive...
1 months ago cs.CR cs.CV cs.LG
PDF
Benchmark HIGH
Thomas Heverin
Prompt injection evaluations typically treat refusal as a stable, binary indicator of safety. This study challenges that paradigm by modeling refusal...
Benchmark HIGH
Zelong Zheng, Jiayuan Zhou, Xing Hu +2 more
Software vulnerability management has become increasingly critical as modern systems scale in size and complexity. However, existing automated...
Benchmark HIGH
Fan Huang, Haewoon Kwak, Jisun An
Large Language Models (LLMs) are increasingly employed in various question-answering tasks. However, recent studies showcase that LLMs are...
2 months ago cs.CL cs.AI
PDF
Benchmark HIGH
Jiayi Yuan, Jonathan Nöther, Natasha Jaques +1 more
While recent automated red-teaming methods show promise for systematically exposing model vulnerabilities, most existing approaches rely on...
2 months ago cs.AI cs.NE
PDF
Benchmark HIGH
Yow-Fu Liou, Yu-Chien Tang, Yu-Hsiang Liu +1 more
Benchmarking large language models (LLMs) is critical for understanding their capabilities, limitations, and robustness. In addition to interface...
Benchmark HIGH
Chutian Huang, Dake Cao, Jiacheng Ji +3 more
Background: While Large Language Models (LLMs) have achieved widespread adoption, malicious prompt engineering specifically "jailbreak attacks" poses...
Benchmark HIGH
Haoze Guo, Ziqi Wei
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...
2 months ago cs.CR cs.HC
PDF
Benchmark HIGH
Shaznin Sultana, Sadia Afreen, Nasir U. Eisty
Context: Traditional software security analysis methods struggle to keep pace with the scale and complexity of modern codebases, requiring...
Benchmark HIGH
Quy-Anh Dang, Chris Ngo, Truong-Son Hy
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount....
Benchmark HIGH
Zejian Chen, Chaozhuo Li, Chao Li +3 more
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (LLMs) and Vision-Language Models (VLMs),...
Benchmark HIGH
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang +1 more
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that...
Benchmark HIGH
Songyang Liu, Chaozhuo Li, Rui Pu +5 more
Jailbreak attacks present a significant challenge to the safety of Large Language Models (LLMs), yet current automated evaluation methods largely...
2 months ago cs.CR cs.CL
PDF
Benchmark HIGH
Md Hasan Saju, Maher Muhtadi, Akramul Azim
The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in...
2 months ago cs.SE cs.AI
PDF
Benchmark HIGH
Jingyu Zhang
Customer-service LLM agents increasingly make policy-bound decisions (refunds, rebooking, billing disputes), but the same ``helpful'' interaction...
2 months ago cs.CR cs.HC
PDF
Benchmark HIGH
Manu, Yi Guo, Kanchana Thilakarathna +5 more
Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This...
2 months ago cs.CR cs.AI cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial