CVE-2024-14021

HIGH

Published January 12, 2026

CISO Take

If your organization uses LlamaIndex with BGE-M3 embedding indices loaded from disk, you have a critical arbitrary code execution exposure. Any pipeline calling BGEM3Index.load_from_disk() from an untrusted or shared persist_dir is exploitable with a crafted pickle file — no special privileges required. Patch to a version above 0.11.6 immediately and audit all index-loading code paths for externally-influenced directory inputs.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
llamaindex	pip	—	No patch

Do you use llamaindex? You're affected.

Severity & Risk

CVSS 3.1

7.8 / 10

EPSS

N/A

KEV Status

Not in KEV

Sophistication

Trivial

Recommended Action

1) Patch: Upgrade llamaindex to a version above 0.11.6 that resolves this issue; validate via changelog or commit history confirming the fix. 2) Audit: Grep all codebases for BGEM3Index.load_from_disk() calls and trace whether persist_dir originates from user input, environment variables, or external storage. 3) Workaround (if patching is delayed): Restrict persist_dir to trusted, immutable, access-controlled local paths; never load from network paths or user-supplied directories. 4) Detection: Alert on .pkl file writes to model/index directories from unexpected processes; monitor anomalous subprocess spawning from LlamaIndex worker processes. 5) Defense-in-depth: Run LlamaIndex workloads in sandboxed containers with restricted syscalls (seccomp) and no outbound network egress to contain blast radius if exploited.

Classification

Code Execution Supply Chain Framework RAG AML.T0010.001 - AI Software AML.T0011 - User Execution AML.T0011.000 - Unsafe AI Artifacts AML.T0018.002 - Embed Malware AML.T0035 - AI Artifact Collection AML.T0049 - Exploit Public-Facing Application

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.10.1 - Information security for AI systems A.8.1 - AI system lifecycle A.8.2 - AI system security

NIST AI RMF

GOVERN-6.1 - Policies and procedures for AI third-party risk MANAGE 2.2 - Mechanisms are in place to address AI risks MANAGE-2.2 - Mechanisms for resolving AI risks are applied

OWASP LLM Top 10

LLM03 - Supply Chain Vulnerabilities LLM05 - Supply Chain Vulnerabilities LLM08 - Vector and Embedding Weaknesses

Technical Details

NVD Description

LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk.

Exploitation Scenario

An attacker targets a data science team using LlamaIndex with BGE-M3 for a RAG knowledge base whose embedding indices are stored in a shared S3 bucket accessible by multiple developers. The attacker either compromises the bucket via misconfigured IAM permissions or tricks a developer into loading a 'sample index' from a phishing link. The malicious persist_dir contains a crafted multi_embed_store.pkl embedding a Python reverse shell payload using pickle's __reduce__ protocol. When any developer runs their pipeline and calls BGEM3Index.load_from_disk(persist_dir='s3-mount/malicious-dir'), pickle.load() executes the payload — achieving RCE on the developer machine with immediate access to model weights, Anthropic/OpenAI API keys, cloud credentials, and lateral movement into internal systems.

Weaknesses (CWE)

CWE-502 Primary

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

References

github.com/run-llama/llama_index
Product
huntr.com/bounties/ab4ceeb4-aa85-4d1c-aaca-4eda1b71fc12
Exploit 3rd Party
llamaindex.ai
Product
vulncheck.com/advisories/llamaindex-bgem3index-unsafe-deserialization
3rd Party

Timeline

Published

January 12, 2026

Last Modified

January 15, 2026

First Seen

January 12, 2026

Back to Threat Feed