Abstract
Web applications have become an inseparable part of our digital life, and therefore, protecting online users and their information is a critical task.
Over the past two decades, defensive measures such as secure software development practices, network protection devices, and intrusion detection systems have matured. Among others, attack-surface reduction is an important defensive concept, which consists of limiting the entry points of applications which can be abused by attackers. Software debloating is one of the concrete instantiations of this idea, and its goal is identification and removal of unnecessary code to prevent its abuse and exploitation in future attacks.In this dissertation, we present our work on identifying and characterizing code-bloat in web applications, as well as techniques to remove this code-bloat while preserving the functionality of web applications based on the users' needs. We start by quantifying the security benefits of debloating web applications. We show that debloating can produce web applications that are 46% smaller than their original versions, while removing tens of historical vulnerabilities.
Next, we discuss the design and implementation of role-based debloating. Through a user study with experienced administrators and developers, we observed that different users require access to vastly different features. To identify users with similar behavior, we design and train an unsupervised clustering model to group users together and assign Roles to them. We then build a content-delivery system based on reverse-proxies to transparently route users to their custom debloated web application and measure the effectiveness of our system. Lastly, we identify the collection of representative real world data on web application usage as one of the main challenges of debloating web applications. By building a PHP emulator capable of concolic execution, we perform an offline reachability analysis based on the commonly-available web server logs to model the users' behavior for given entry points. We then use this information to debloat web applications and demonstrate that the performance of concolic execution for debloating is comparable to prior dynamic debloating schemes without suffering from their limitations.