Cruzzer – Combining Crawling and Coverage-guided Fuzzing for Web applications

Web applications are an essential part of the experience SAP offers its customers. However, as web applications become more complex, there are more opportunities for bugs to be introduced. Therefore, web developers and security researchers need effective tools and techniques to detect and fix software bugs before they manifest in unwanted or even dangerous behavior in production. In desktop application development, it is common to leverage fuzz testing to complement the manually designed unit- and integration tests. In this approach, random input data is generated and fed into a software application to see how it behaves to improve the overall quality and reliability of applications. Contrary to traditional desktop applications, fuzzing is not as established in web development. It can be used to test API’s or single input fields, but what if a security analyst wants to test an entire web application with the same set of capabilities a normal user has? In this blog post, we present Cruzzer, the first coverage-guided fuzzer that leverages web application discovery to holistically test an application.

Coverage-guided Fuzzing

A fuzzer [7] is an application that randomly generates input and feeds it to a program under test (PUT) to analyze its behavior. The fuzzing application simply takes an initial set of “seeds” that are considered well-formed and randomly mutates these by, e.g., bit flips, string manipulation, or permutation. The rule of thumb for unit tests is an 80% test coverage, however, the interesting part lies in the remaining 20% [7]. This dynamic testing approach helps developers to automatically reveal edge and corner cases in large complex applications.

Consider a simple calculator application which is handed an expression as a string and returning the evaluated result. We expect the calculator given this string “15/3” to return “5”. Can you think of an edge case where we might observe strange behavior? Of course, triggering a division-by-zero is one edge-case, but what about “aaaaaaaa”?

This may already help developers to track down undefined behavior but what if the calculator only takes well-formed expressions, how likely is it for a fuzzer to get from a seed “15/3” to another well-formed expression that might reveal new program behavior using random mutations? Simply generating random inputs might take too long from one valid input to the next. Coverage-guided fuzzing extends the classic fuzzing by using information obtained during the program execution to rank the randomly generated inputs by their reached number of branches (coverage). If we have two randomly generated input strings where one reaches no new statement during execution while the latter achieves a higher coverage, we would certainly prefer the latter as the mutation candidate for the next fuzz-round.

Technically the necessary information from the program execution is collected via instrumentation of the binary, such that during the fuzzing process, the application can report reached branches or coverage information. The input seeds that reach a higher coverage or new execution trace are prioritized while inputs reaching already known traces are discarded.

How can we apply a fuzzer to a complex web application?

Modern web applications typically consist of a frontend which tracks user interaction and a backend that processes it. The front end has the task to take input, send it to the backend, and to adjust the view presented to the user. For example, if the user successfully logged in, we expect the website to move to the home view.

Web application fuzzers typically target the backend, since this holds complex logic, has side-effects like database or filesystem access, and is more critical, since it’s executed on a server rather than a local machine. The backend API might be a RESTful, SOAP, WebSocket, GraphQL service, or even a JSON-RPC API or something entirely different. The point is, that the backend service requires the input data in a very specific format transmitted over a very specific protocol.

The disadvantages are obvious:

The developer must specifically find a fuzzer for the protocol its backend uses.
The developer needs to configure his fuzzer specifically for the transmission data structure (JSON, XML, …).
The fuzzer must be separately applied to each endpoint.
The fuzzer does not use the functionality intended for the user during production.

A better way would be to let the fuzzer navigate the application and interact with the frontend itself. The most intuitive way is, to let the fuzzer navigate the application automatically by giving it the ability to crawl the web application and to collect all possible input sources (e.g., HTML forms).

Cruzzer

This is exactly what Cruzzer does: If provided with a URL and a set of seeds, Cruzzer crawls the URL to discover the application automatically. Thereby, it stores all forms for the next step. During the fuzzing process, Cruzzer takes a single form and mutates initial seeds for every possible input type i.e., password and email fields, checkboxes, radio buttons, text areas, etc. It submits the form with the generated input and tracks the coverage. If the execution reaches a new unseen trace, it marks the input as a next potential mutation candidate.

Cruzzer

How does Cruzzer get the coverage information?

In Java, we can simply hook a JaCoCo agent into the running Java Virtual Machine. JaCoCo inserts probes [3] into every branch within the compiled code. For PHP, for instance, Cruzzer uses the xdebug debugger to count executed lines. The important part is, that this happens completely automatically and can be extended to any other programming language.

How does Cruzzer perform?

We compare Cruzzer’s performance against OWASP ZAP, which is a web application security testing suite that solely relies on hand-crafted rules rather than fuzzing. On average OWASP ZAP needs 50ms +- 10ms, while Cruzzer only needs 46ms +- 7ms on average per request on my local MacBook, even though Cruzzer also receives and parses the coverage information after each request.

Let’s consider a real vulnerable Java web application: We choose the JavaVulnerableLab by CSPF [5]. It has about 20 forms resembling a realistic application. Interestingly, it seems like Cruzzer plateaus to a better optimum with fewer requests compared to ZAP. We could even provide Cruzzer with a set of regular expressions that it matches against the response body to report whether a predefined state has been reached.

Java%20Application%20Testing

Java Application Testing

What more can we do?

Since Cruzzer mimics a regular user of the web application, we can give him different permissions to the app. Some web applications distinguish user roles, such as “admin”, “registered user” or “guest”. We could run Cruzzer with different roles by simply giving it the username and password as a seed. We proceed by tracking every line of code that has been reached by different fuzzing runs with different user roles and check if we encounter an anomaly.

For this experiment, we place two backdoors in the latest version of WordPress, a popular PHP-based content management system, e.g.:

if($_COOKIE[‘input’]==’admin’) login();
if($_GET[‘password’]==”123456”) setcookie(“admin_session”,”1”);

The former, lets us log in to the application if we craft a cookie and the latter gives us administrator rights if we supply a GET parameter to the request with a hardcoded password. Then, we use machine learning to create a numerical representation for each line of code based on their contextual token relationship. We hope that the learned numerical projections for each line of code, based on their privilege level, reveal interesting properties.

Privilege%20Anomaly%20Experiment

Privilege Anomaly Experiment

In the above figure, the green dots denote code lines that are accessible by any privilege level, blue denotes code statements that were only executed by Cruzzer while it was logged in and purple are code locations reached in the admin section. We immediately observe three perfectly separated clusters each for their respective privilege level. Further, we can see that the backdoors in red are located rather outside of the unprivileged cluster and moving towards the admin privilege cluster, hinting us the presence of a privilege violation bug.

Conclusion & Outlook

I hope we could spark your interest in our novel dynamic web application testing tool. Although the tool is still prototypical, it is used actively in our research. Soon, we want to enhance it with the ability to interact with dynamic JavaScript-controlled HTML forms and to integrate a NodeJS coverage tracer. Unfortunately, Cruzzer is still under review and will be released as an open-source project soon. Stay tuned.

Acknowledgements

Thanks to Anh Phan Duc and Martin Härterich for reviewing and special thanks to Hendrik Stumpf for providing the privilege bug results.

security%20research%201