A Pentester’s Guide to Input Validation

“Input Validation” is a broad term, but here we’ll specifically review those topics relevant to pentesting. This guide is created from the perspective of a pentester, but is geared towards developers and technical executives who wish to strengthen their applications.

Background

When dynamic web applications first evolved from their static predecessors, a new era of security emerged. Websites went from purely informational documents to powerful systems allowing users to manage their lives online.

But applications needed to handle user input safely, and ensure that they cannot not be maliciously subverted.

A core part of a penetration test is to assess how well an application validates input. During a pentest, attempts are made with all available inputs to access unauthorized data, tamper with database queries, and inject JavaScript.

What is Input Validation?

Input validation is the practice of sanitizing data to ensure it cannot adversely affect functional components of the application. For example, as user data is used to construct database queries, the application must ensure that those queries are not maliciously modified.

Understanding Injection

‘Injection’ is a term used in pentesting for when a malicious input can cause a specific technology to do something unintended.

For example, ‘SQL Injection’ refers to when a user can craft an input to create their own SQL queries, often resulting in a compromise of data.

“Script Injection” (or Cross-site Scripting) refers to when an attacker can cause arbitrary JavaScript to run in another user’s browser.

While there are many types of injection, some are so common that they have become a fundamental skill set of penetration testing.

Let’s review some of these.

Cross-site Scripting (XSS)

Cross-site Scripting (XSS) is a vulnerability that allows an attacker to execute arbitrary JavaScript in a victim’s web browser. By doing so, the victim’s session can be controlled or stolen by the attacker.

Understanding the Attack

This condition occurs when a web application does not properly encode dangerous characters, allowing the attacker to craft JavaScript that executes when another user uses the application.

For this reason, applications must carefully sanitize and encode data submitted by users. As an example, below shows two parameters sent by a user. The developer has forgotten to encode the lastname parameter, creating an XSS vulnerability:

<p>You searched for:</p>
First name: &lt;script&gt;alert(1);&lt;/script&gt;
Last name: <script>alert(2);</script>

But unfortunately preventing script injection is not always so straightforward. Complex applications accept many types of input, and use that input in different contexts. Depending on where this data is used, the type of encoding needed to prevent XSS may change.

This is why pentesters must often analyze long chains of data handling behavior. Vulnerabilities often occur as a result of multiple application components behaving in unexpected ways.

A great example of this is an XSS vulnerability we discovered in Twitter, allowing a tweet to self-propagate.

SQL Injection

Storing data in a database in a ubiquitous trait of dynamic web applications. And therefore, SQL Injection remains as one of the most commonly found critical risk vulnerabilities today.

When user data is directly combined with SQL queries, users may have the ability to alter application logic and take complete control over the database.

As an example, the following login request can be made to bypass authentication on a vulnerable login form:

Other Injection Types

The challenges of ‘injection’ extend well beyond the examples covered here. In fact, it extends to nearly every technology that may handle data generated by users.

Despite this, there are many other well known injection attacks worth calling out:

Command Injection – when applications run shell commands, or run backend batch jobs, applications must prevent users from escaping these strings.
XML Injection – XML parsers often have powerful features that can be invoked by a specially crafted XML document. Applications that accept XML from users must carefully evaluate how XML is parsed.
Template Injection – when data can be injected directly to the rendering layer, it may be possible to execute arbitrary code.
ORM Injection – similar to SQL Injection, however, queries are modified at the ORM layer.

The types of injection should be considered limitless and always subjective to the technology used by each application.

Other Input Validation Vulnerabilities

Server-side Request Forgery (SSRF)

SSRF is a vulnerability where an attacker can coerce an application to make unauthorized requests to other web resources. This can often be used to access internal resources or obtain sensitive information.

This commonly occurs as vulnerabilities in PDF rendering utilities but can manifest in many different types of software.

URL Redirection

Web applications frequently redirect users to external resources by using user-controlled parameters. These features can be abused to redirect users to malicious websites as part of phishing campaigns.

Input Validation and Penetration Testing

As you might expect, different types of injection vulnerabilities require different testing techniques. Let’s look at what this means for pentesting.

Pentesting Goals for Input Validation

Identify and understand application components.
Enumerate and discover accessible inputs.
Identify filters and encoding schemes.
Inject payloads and validate successful injections.

Finding Inputs

Many inputs are easy to spot, GET and POST parameters are often the primary toggles used to control application behavior. But a careful review is always needed to evaluate less common inputs, which may include headers, cookies, and unused parameters.

Third-party integrations and external applications also create an array of unconventional inputs. These should not be overlooked, as they may be an important part of the application’s attack surface.

For this reason, simply using the application is rarely enough to understand all inputs.

Bypassing Filters and Encoding Schemes

Application pentesting is often a cat and mouse game to bypass input filters. A careful evaluation is always needed to understand:

What characters or strings are filtered?
What Encoding Schemes are used?
Can data be input via overlooked sources to bypass that encoding?

Example: consider an application where developers have decided to prevent XSS attacks by stripping <script> tags from input. This can be easily bypassed by supplying the string <sc<script>ript>, where once the filter is applied, the output is then <script>.

As you can imagine, this process is so unique to each application that automated tests will almost always fall short in this area.

Pentesting File Uploads

A simple file upload function can have a wide range of potential attacks. During a pentest, the following come under close scrutiny:

Attempt to upload web shells or files that may be interpreted as server-side code.
Bypass restrictions of file names or file type.
Attempt to alter the upload destination with directory traversal attacks.
Determine if the application disallows malicious files.

Defensive Measures

To counter these attacks, applications should take the following measures:

Ensure file names and extensions cannot be controlled by users.
Validate the Content-Type of uploaded files.
Prevent traversal attacks by restricting “dot-dot-slash” sequences in file names.
Perform server-side scanning of files to detect potential malware.

Exploiting Localization and Internationalization

Most modern technologies support localization for international users. But different character sets are not always handled as intended.

Extended character sets can be used to evade filters or craft data useful for phishing attacks. Our research exploiting these behaviors led to two CVEs affecting Chrome and Mozilla:

Fuzzing, Explained

Fuzzing is a strategy of testing where a large list of inputs is used to detect anomalies. Anomalies are often the first step to identify a vulnerability, as they indicate the application is mishandling a specific type of data.

Fuzz Lists

Maintaining quality fuzz lists is a valuable asset for pentesting. These lists are typically compiled for specific types of vulnerabilities and contain many common attack strings, or ‘payloads’.

Burp Intruder

Web application pentesters will typically use Burp’s ‘Intruder’ module to make repeated requests with each payload. This module includes default payload lists, a framework to define insertion points, and an array of optional conditional logic to apply.

The output can then be analyzed to identify outliers. Typically, sorting by response length or error code will reveal the payloads of interest.

Validating Input – Defensive Strategies

It’s important to understand some common methods used for input validation.

Allow list vs Block list
URL encoding
HTML entity encoding
Parameterized queries

Encoding Methods for Keeping Data Safe

Developers should evaluate encoding schemes to determine which may be most appropriate. However, the following schemes are commonly used to safely handle data:

URL Encoding

URL Encoding was originally designed to allow URI’s to support reserved characters and binary data. When constructing URI’s from untrusted data, URL encoding should be used.

HTML Entity Encoding

A set of reserved characters using the following values:

< (Less than): <
> (Greater than): >
& (Ampersand): &
" (Double quotation mark): "
' (Single quotation mark or apostrophe): ' or ' (The latter is often preferred for compatibility)

This encoding can be used to display literal characters, instead of rendering them within the HTML.

Base64 Encoding

For use cases where data includes binary characters, Base64 is a common choice to safely encode and preserve these characters. It should be noted that base64 is not an encryption algorithm and does not offer any type of security.

Example:

dGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHN0cmluZyE=

Decoded value:

this is a base64 encoded string!

Allow Lists vs Block Lists

When implementing character filters it’s important to be aware of two approaches:

Allow Lists	Block Lists
A list of known-good characters are allowed, and everything else is rejected.	A list of known dangerous characters are blocked, and all other characters are allowed.
Pros	Pros
Good for situations where the application only expects letters and numbers.	Allows a broader range of input that may be needed for free-form text fields.
Cons	Cons
This list can be overly restrictive for some inputs, and may cause	It’s easy to overlook characters that may be dangerous.

Variable Casting and Conversion

Care should also be taken to ensure variables are cast into appropriate formats. Unexpected variable formats or incorrect variable type use can lead to the following vulnerabilities:

Integer overflow, underflow, or wraparound conditions
Type confusion vulnerabilities
Format string vulnerabilities
Application crashes / verbose errors

Conclusion

As you can now see, input validation is a critical subject for penetration testing. Not only does it cover a broad range of pentesting test cases, but it overlaps with many others as well.