CVE-2024-14021

HIGH
Published January 12, 2026
CISO Take

If your organization uses LlamaIndex with BGE-M3 embedding indices loaded from disk, you have a critical arbitrary code execution exposure. Any pipeline calling BGEM3Index.load_from_disk() from an untrusted or shared persist_dir is exploitable with a crafted pickle file — no special privileges required. Patch to a version above 0.11.6 immediately and audit all index-loading code paths for externally-influenced directory inputs.

Affected Systems

Package Ecosystem Vulnerable Range Patched
llamaindex pip No patch

Do you use llamaindex? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
N/A
KEV Status
Not in KEV
Sophistication
Trivial

Recommended Action

  1. 1) Patch: Upgrade llamaindex to a version above 0.11.6 that resolves this issue; validate via changelog or commit history confirming the fix. 2) Audit: Grep all codebases for BGEM3Index.load_from_disk() calls and trace whether persist_dir originates from user input, environment variables, or external storage. 3) Workaround (if patching is delayed): Restrict persist_dir to trusted, immutable, access-controlled local paths; never load from network paths or user-supplied directories. 4) Detection: Alert on .pkl file writes to model/index directories from unexpected processes; monitor anomalous subprocess spawning from LlamaIndex worker processes. 5) Defense-in-depth: Run LlamaIndex workloads in sandboxed containers with restricted syscalls (seccomp) and no outbound network egress to contain blast radius if exploited.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.10.1 - Information security for AI systems A.8.1 - AI system lifecycle A.8.2 - AI system security
NIST AI RMF
GOVERN-6.1 - Policies and procedures for AI third-party risk MANAGE 2.2 - Mechanisms are in place to address AI risks MANAGE-2.2 - Mechanisms for resolving AI risks are applied
OWASP LLM Top 10
LLM03 - Supply Chain Vulnerabilities LLM05 - Supply Chain Vulnerabilities LLM08 - Vector and Embedding Weaknesses

Technical Details

NVD Description

LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk.

Exploitation Scenario

An attacker targets a data science team using LlamaIndex with BGE-M3 for a RAG knowledge base whose embedding indices are stored in a shared S3 bucket accessible by multiple developers. The attacker either compromises the bucket via misconfigured IAM permissions or tricks a developer into loading a 'sample index' from a phishing link. The malicious persist_dir contains a crafted multi_embed_store.pkl embedding a Python reverse shell payload using pickle's __reduce__ protocol. When any developer runs their pipeline and calls BGEM3Index.load_from_disk(persist_dir='s3-mount/malicious-dir'), pickle.load() executes the payload — achieving RCE on the developer machine with immediate access to model weights, Anthropic/OpenAI API keys, cloud credentials, and lateral movement into internal systems.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
January 12, 2026
Last Modified
January 15, 2026
First Seen
January 12, 2026