Attack LOW relevance

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Jiajun Xu Jiageng Mao Ang Qi Weiduo Yuan Alexander Romanus Helen Xia Vitor Campagnolo Guizilini Yue Wang

cs.LG cs.AI

Published

February 17, 2026

Updated

February 17, 2026

Links

PDF arxiv

Abstract

Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement finetuning: we transform a single input query into a large set of diverse variants through vision and language fuzzing. Based on the fuzzing outcomes, the question generator is further instructed by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. With this approach, we can consistently drive down a target VLM's answer accuracy -- for example, the accuracy of Qwen2.5-VL-32B on our generated questions drops from 86.58\% to 65.53\% in four RL iterations. Moreover, a fuzzing policy trained against a single target VLM transfers to multiple other VLMs, producing challenging queries that degrade their performance as well.

Metadata

Comment: 18 pages, 4 figures. † These authors jointly supervised this work: Jiageng Mao and Yue Wang

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive

ATLAS Mapping

Compliance Reports

Actionable Recommendations

Start 14-Day Free Trial

Back to Research