Preventing Cross-site Scripting In PHP

Preventing Cross-site Scripting (XSS) vulnerabilities in all languages requires two main considerations: the type of sanitization performed on input, and the location in which that input is inserted. It is important to remember that no matter how well input is filtered; there is no single sanitization method that can prevent all Cross-site Scripting (XSS). The filtering required is highly dependent on the context in which the data is inserted. Preventing XSS with data inserted between HTML elements is very straightforward. On the other hand, preventing XSS with data inserted directly into JavaScript code is considerably more difficult and sometimes impossible.

Input Sanitization

For the majority of PHP applications, htmlspecialchars() will be your best friend. htmlspecialchars() supplied with no arguments will convert special characters to HTML entities, below shows the conversions performed:

'&' (ampersand) becomes '&amp;'
'"' (double quote) becomes '&quot;'
'<' (less than) becomes '&lt;'
'>' (greater than) becomes '&gt;'

Eagle eyed readers may notice this does not include single quotes. For this reason we recommend that htmlspecialchars() is always used with the ‘ENT_QUOTES’ to ensure single quotes will be encoded. Below shows the singe quote entity conversion:

"'" (single quote) becomes '&#039;' (or &apos;)

htmlspecialchars() vs htmlentities()

Another function exists which is almost identical to htmlspecialchars(). htmlenities() performs the same functional sanitization on dangerous characters, however, encodes all character entities when one is available. This may lead to excessive encoding and cause some content to display incorrectly if character sets change.

strip_tags()

strip_tags() should NOT be used exclusively for sanitizing data. strip_tags() removes content between HTML tags and cannot prevent XSS instances that exist within HTML entity attributes. strip_tags() also does not filter or encode non-paired closing angle brackets. An attacker may be able to combine this with other weaknesses to inject fully functional JavaScript on the page. We recommended that strip_tags() only be used for its intended functional purpose: to remove HTML tags or content. In these situations, input should be passed through htmlspecialchars() after strip_tags() is used.

addslashes()

addslashes() is often used to escape input when inserted into JavaScript variables. An example is shown below:

http://www.example.com/view.php?name=te"st
[...]
<script>
 var = "te\"st ";   // addslashes()
 displayname(var);
</script>

As we can see, addslashes() adds a slash in attempt to prevent an attacker from terminating the variable assignment and appending executable code. This works, sort of, but has a critical flaw. Most JavaScript engines will construct code segments from open and closed <script> tags before it parses the code within them. This is done before the browser even cares about the data that resides between the two quotes. So to exploit this, we don’t actually need to “bypass” addslashes(), but simply terminate the script tag.

<script>
 var = "test1</script><script>alert(document.cookie);</script>";
 displayname(var);
</script>

As far as the browser is concerned, the code injected is an entire new code segment and contains valid JavaScript.

Where Entity Encoding Fails

We talked before about considerations for the location of data, and will go over some examples where entity encoding with htmlspecialchars() is not enough. One of the most common examples of this is when data is inserted within the actual tag or attribute of an element.

HTML Event Attributes: HTML has a number of elements with attributes that allow for JavaScript to be called after a particular event. For example, the onload attribute can execute JavaScript when an HTML object is loaded.

<body onload=alert(document.cookie);>

This is just one of many somewhat rare situations where extremely strict filtering is required. For an in depth look at many injection scenarios and their prevention methods, take a look at the OWASP XSS Prevention Cheat Sheet.

Third Party PHP Libraries

Virtue Security makes no recommendation or provides any warranty for third party products or software; however, we are aware that several third party PHP libraries are commonly used to assist in XSS prevention. Below are projects that may assist developers building suitable whitelists:

HTML Purifier – http://htmlpurifier.org/
PHP Anti-XSS – https://code.google.com/p/php-antixss/
htmLawed – http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/

Other Things to Remember

A great rule of thumb to go by is simply not to insert user controlled data unless its explicitly needed for the application to function. It’s often surprising to see XSS vulnerabilities exist because parameters are inserted into HTML or JavaScript comments. Not only does this serve no functional purpose to the application, but it can introduce serious security vulnerabilities.