Benchmark LOW
Zihang Wang, Xu Li, Benwu Wang +7 more
Explainability and transparent decision-making are essential for the safe deployment of autonomous driving systems. Scene captioning summarizes...
3 weeks ago cs.RO cs.AI
PDF
Benchmark MEDIUM
Haodong Zhao, Jinming Hu, Zhaomin Wu +7 more
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a...
Benchmark MEDIUM
Om Tailor
Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol...
3 weeks ago cs.CR cs.AI eess.SY
PDF
Benchmark LOW
Rahul Baxi
AI agents are increasingly granted economic agency (executing trades, managing budgets, negotiating contracts, and spawning sub-agents), yet current...
Benchmark LOW
Yashas Hariprasad, Subhash Gurappa, Sundararaj S. Iyengar +3 more
The Forensics Investigations Network in Digital Sciences (FINDS) Research Center of Excellence (CoE), funded by the U.S. Army Research Laboratory,...
3 weeks ago cs.CR cs.AI
PDF
Benchmark HIGH
Zhicheng Fang, Jingjie Zheng, Chenxu Fu +1 more
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare...
3 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark HIGH
Xuhui Dou, Hayretdin Bahsi, Alejandro Guerra-Manzanares
Recent work applies Large Language Models (LLMs) to source-code vulnerability detection, but most evaluations still rely on random train-test splits...
3 weeks ago cs.CR cs.AI cs.LG
PDF
Benchmark MEDIUM
Chung-ju Huang, Huiqiang Zhao, Yuanpeng He +5 more
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential...
3 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
David Condrey
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are...
3 weeks ago cs.CR cs.HC cs.LG
PDF
Benchmark LOW
Zhengqing Yuan, Kaiwen Shi, Zheyuan Zhang +3 more
Scientific research relies on accurate citation for attribution and integrity, yet large language models (LLMs) introduce a new risk: fabricated...
3 weeks ago cs.CL cs.DL
PDF
Benchmark LOW
Yuan Liang, Ruobin Zhong, Haoming Xu +46 more
Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic...
3 weeks ago cs.AI cs.CL cs.CV
PDF
Benchmark LOW
Jiazheng Quan, Xiaodong Li, Bin Wang +5 more
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities....
3 weeks ago cs.CR cs.AI cs.SE
PDF
Benchmark LOW
Balazs Pejo
Federated learning offers a privacy-friendly collaborative learning framework, yet its success, like any joint venture, hinges on the contributions...
3 weeks ago cs.LG cs.CR
PDF
Benchmark LOW
Vladimer Khasia
The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized...
Benchmark MEDIUM
Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle +3 more
Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on...
Benchmark LOW
Mohammed Cherifi
Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers...
4 weeks ago cs.DC cs.AI cs.LG
PDF
Benchmark MEDIUM
Guangnian Wan, Qi Li, Gongfan Fang +2 more
Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their...
4 weeks ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Longxiang Wang, Xiang Zheng, Xuhao Zhang +3 more
Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities...
4 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Lei Ba, Qinbin Li, Songze Li
LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code...
Benchmark MEDIUM
Jingwei Shi, Xinxiang Yin, Jing Huang +2 more
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing...
1 months ago cs.SE cs.AI cs.CR
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial