CVE-2025-66448 — HIGH (CVSS 8.8) AI Security Vulnerability

CISO Take

vLLM's trust_remote_code=False flag is completely ineffective in versions prior to 0.11.1 — attackers can publish a benign-looking model on any public hub (e.g., Hugging Face) that silently executes arbitrary Python on your inference server at load time. If you run vLLM in production, patch to 0.11.1 immediately and audit every model source your pipelines pull from. Until patched, treat every external model load as a potential RCE vector regardless of your trust settings.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	< 0.11.1	`0.11.1`
vllm	pip	—	No patch

Severity & Risk

CVSS 3.1

8.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

KEV Status

Not in KEV

Sophistication

Moderate

Recommended Action

1. PATCH: Upgrade vLLM to >= 0.11.1 immediately (pip install --upgrade vllm). 2. INTERIM WORKAROUND: Until patched, restrict model sources to an internal registry or a vetted allowlist — do not load arbitrary community models. 3. AUDIT: Review all model sources currently in use; check config.json files for auto_map entries pointing to external repositories. 4. DO NOT TRUST THE FLAG: Explicitly passing trust_remote_code=False is NOT a compensating control in affected versions — remove it from your runbooks as a false safety net. 5. SANDBOX: Run model loading in isolated containers with no network egress to reduce blast radius. 6. DETECT: Monitor for unexpected outbound connections from vLLM processes during model initialization; alert on connections to github.com/huggingface.co from inference hosts that are not part of approved model pull workflows. 7. VERIFY: After patching, confirm your vllm version with pip show vllm.

Classification

Supply Chain Code Execution Framework Inference Model AML.T0002.001 - Models AML.T0010.001 - AI Software AML.T0010.003 - Model AML.T0011.000 - Unsafe AI Artifacts AML.T0021 - Establish Accounts AML.T0058 - Publish Poisoned Models AML.T0072 - Reverse Shell AML.T0074 - Masquerading

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.25 - Obligations of Deployers Art.9 - Risk Management System Article 13 - Transparency and provision of information Article 9 - Risk management system

ISO 42001

A.6.1.2 - AI Supply Chain Management A.6.1.5 - Information security in project management A.8.1.1 - Inventory of assets and supplier management A.9.2 - AI Risk Treatment

NIST AI RMF

GOVERN 6.1 - AI Supply Chain Risk Management GOVERN-6.1 - Policies and practices for AI supply chain risk MANAGE 2.2 - Mechanisms for Managing AI Risks MAP-5.2 - AI system risks propagated to downstream users

OWASP LLM Top 10

LLM03 - Supply Chain Vulnerabilities LLM05 - Supply Chain Vulnerabilities

Technical Details

NVD Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves that mapping with get_class_from_dynamic_module(...) and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in the auto_map string. Crucially, this happens even when the caller explicitly sets trust_remote_code=False in vllm.transformers_utils.config.get_config. In practice, an attacker can publish a benign-looking frontend repo whose config.json points via auto_map to a separate malicious backend repo; loading the frontend will silently run the backend’s code on the victim host. This vulnerability is fixed in 0.11.1.

Exploitation Scenario

An adversary registers two GitHub/Hugging Face accounts. The first hosts a legitimate-looking multimodal model repository (frontend repo) with a well-crafted README, model card, and config.json. The config.json includes an auto_map field pointing to the adversary's second repository (backend repo) which hosts a malicious Python class. The frontend repo is promoted in AI/ML communities, referenced in blog posts, or submitted to model leaderboards to build credibility. A target organization's MLOps pipeline or developer runs vllm.LLM('attacker/benign-model', trust_remote_code=False) — the False flag is ignored, vLLM resolves get_class_from_dynamic_module against the auto_map URL, fetches the malicious Python from the backend repo, and executes it on the inference host. The payload can drop a reverse shell, exfiltrate environment variables (AWS credentials, OpenAI keys, internal API tokens), or install a persistent backdoor. The entire compromise happens silently before any inference request is processed.

Weaknesses (CWE)

CWE-94 Improper Control of Generation of Code ('Code Injection') Primary CWE-94

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

References

Timeline

Published

December 1, 2025

Last Modified

December 3, 2025

First Seen

December 1, 2025