GHSA-mcmc-2m55-j8jj

GHSA-mcmc-2m55-j8jj HIGH
Published January 8, 2026
CISO Take

The original patch for CVE-2025-62164 in vLLM was a workaround, not a fix — it just disabled prompt embeddings by default, leaving any deployment that re-enables the feature fully exposed to DoS via malformed sparse tensors. Upgrade to vLLM 0.13.0 immediately, which introduces actual sparse tensor validation. If you cannot patch today, verify the prompt embeddings feature flag is explicitly disabled in all your inference deployments.

Affected Systems

Package Ecosystem Vulnerable Range Patched
vllm pip >= 0.10.2, < 0.11.1 0.13.0

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1
8.8 / 10
EPSS
N/A
KEV Status
Not in KEV
Sophistication
Moderate

Recommended Action

  1. 1. PATCH: Upgrade to vLLM 0.13.0, which includes the real fix (sparse tensor index validation via PR #30649). 2. IMMEDIATE WORKAROUND: Confirm `enable_prompt_embeds` flag is explicitly set to `False` in all vLLM serving configs — do not rely on the default. 3. NETWORK CONTROLS: Restrict vLLM inference API endpoints to trusted internal callers; avoid direct public exposure without an authenticated gateway. 4. DETECTION: Monitor for requests with unusually large or malformed embedding payloads; anomalous memory usage spikes or inference worker crashes should trigger alert review. 5. AUDIT: Identify all internal services and pipelines that call vLLM directly and assess whether prompt embedding is in use.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, Robustness and Cybersecurity Article 15 - Accuracy, robustness, and cybersecurity Article 9 - Risk management system
ISO 42001
6.1.2 - AI risk assessment A.6.2.5 - AI system inputs A.8.3 - Risk Treatment for AI Systems A.9.5 - AI System Availability and Resilience
NIST AI RMF
GOVERN-1.7 - Processes for Identifying Risks in AI Supply Chain MANAGE-2.2 - Residual risk response MANAGE-2.4 - Residual Risks After Treatment MEASURE-2.5 - AI system robustness
OWASP LLM Top 10
LLM04:2025 - Model Denial of Service LLM08:2025 - Vector and Embedding Weaknesses LLM10:2025 - Unbounded Consumption

Technical Details

NVD Description

### Summary The fix [here](https://github.com/vllm-project/vllm/pull/27204) for CVE-2025-62164 is not sufficient. The fix only disables prompt embeds by default rather than addressing the root cause, so the DoS vulnerability remains when the feature is enabled. ### Details vLLM's pending change attempts to fix the root cause, which is the missing sparse tensor validation. PyTorch (~v2.0) disables sparse tensor validation (specifically, sparse tensor invariants checks) by default for performance reasons. vLLM is adding the sparse tensor validation to ensure indices are valid, non-negative, and within bounds. These checks help catch malformed tensors. ### PoC NA ### Impact Current fix only added a flag to disable/enable prompt embeds, so by default, prompt embeds feature is disabled in vLLM, which stops DoS attacks through the embeddings. However, It doesn’t address the problem when the flag is enabled and there is still potential for DoS attacks. ### Changes * https://github.com/vllm-project/vllm/pull/30649

Exploitation Scenario

An attacker with API credentials to a vLLM-backed inference endpoint — a compromised service account, a malicious insider, or a tenant in a multi-tenant deployment — constructs a request containing prompt embeddings with malformed PyTorch sparse tensors: indices that are negative, out-of-bounds, or violate sparse tensor invariants. Since PyTorch v2.0 disabled these invariant checks by default for performance, vLLM (without the real fix) passes the tensor to the compute path unchecked. This triggers an out-of-bounds memory write, causing the inference worker process to crash (DoS) or potentially corrupt adjacent memory. Because vLLM is typically deployed as a shared inference backend serving multiple users or services, a single malformed request takes down availability for all consumers.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
January 8, 2026
Last Modified
January 8, 2026
First Seen
March 24, 2026