ABSTRACT:
Host-based virtualization has long been the technology that cloud applications were built on. This remains the case: host-based virtualization is still a core part of the cloud-computing stack. However, modern clouds now have several layers of additional services built on top of the host-virtualization layer. These layers include services used for building applications, such as Function as a Service (FaaS) and serverless platforms, as well as entire applications managed on behalf of the user (e.g., Software as a Service, or SaaS).
The introduction of these layers is a part of a larger trend towards cloud native applications and platforms. In cloud native environments, applications are supposed to be more dynamic and ephemeral. FaaS and SaaS systems enable these characteristics by shifting the burden of application management from the user to the service provider, performing operations that might be too cumbersome or difficult for users to do on their own. For example, Amazon Web Services' (AWS) Lambda is a popular FaaS platform that automatically scales compute resources to closely match current demand, avoiding over-provisioning, reducing user's costs. AWS Elastic File System (EFS) is a popular SaaS product, providing an NFS server that automatically scales to fit a user's data requirements.
This thesis proposal has two thrusts. In the first thrust, we begin by exploring the unique challenges to measuring performance in these new cloud native environments. We developed CNSBench, which enables developers to assess the performance of their application and infrastructure by allowing developers to create test workloads that are representative of real cloud native environments: they are dynamic and consisting of a diverse set of individual workloads.
Once we are able more accurately to measure the performance of cloud native applications, we turn to improving performance. Therefore, in the second thrust, this proposal introduces new techniques for managing data that are tailored to the usage patterns common in cloud native environments, reducing costs to users while maintaining required levels of data durability. Data exchange between individual components has been identified as a frequent bottleneck in cloud native applications, especially those deployed on serverless platforms.
We propose F3, a file system designed to optimize data exchange in serverless platforms. F3 introduces new methods for handling ephemeral data and modifications to a serverless-scheduling algorithm so that data-locality is considered when scheduling serverless actions. These changes help to adapt existing file-based storage options to modern, cloud native applications and use cases.
By introducing new methods for handling ephemeral data, F3 makes a tradeoff between durability and performance. It does so by using data stores with higher performance that are less durable, for storing ephemeral data passed between application components. We plan to further explore how lower durability storage can be used, trading off durability for (dollar) cost. We propose to develop a model that determines the durability level most appropriate for an application and its data. We plan to introduce an application architecture that utilizes this model to place data in cheaper storage while still meeting the data's durability requirements, thereby reducing overall costs to users.
Cloud native environments offer significant benefits over more traditional host virtualization based cloud environments. Many of these benefits are driven by the adoption of FaaS and SaaS platforms that provide features such as on-demand computing, fine grained resource allocation and billing, and quick and easy deployments. Therefore, it is our thesis that to fully realize these benefits, new performance measurement techniques and efficiency approaches to data handling are required.ABSTRACT:
Host-based virtualization has long been the technology that cloud applications were built on. This remains the case: host-based virtualization is still a core part of the cloud-computing stack. However, modern clouds now have several layers of additional services built on top of the host-virtualization layer. These layers include services used for building applications, such as Function as a Service (FaaS) and serverless platforms, as well as entire applications managed on behalf of the user (e.g., Software as a Service, or SaaS).
The introduction of these layers is a part of a larger trend towards cloud native applications and platforms. In cloud native environments, applications are supposed to be more dynamic and ephemeral. FaaS and SaaS systems enable these characteristics by shifting the burden of application management from the user to the service provider, performing operations that might be too cumbersome or difficult for users to do on their own. For example, Amazon Web Services' (AWS) Lambda is a popular FaaS platform that automatically scales compute resources to closely match current demand, avoiding over-provisioning, reducing user's costs. AWS Elastic File System (EFS) is a popular SaaS product, providing an NFS server that automatically scales to fit a user's data requirements.
This thesis proposal has two thrusts. In the first thrust, we begin by exploring the unique challenges to measuring performance in these new cloud native environments. We developed CNSBench, which enables developers to assess the performance of their application and infrastructure by allowing developers to create test workloads that are representative of real cloud native environments: they are dynamic and consisting of a diverse set of individual workloads.
Once we are able more accurately to measure the performance of cloud native applications, we turn to improving performance. Therefore, in the second thrust, this proposal introduces new techniques for managing data that are tailored to the usage patterns common in cloud native environments, reducing costs to users while maintaining required levels of data durability. Data exchange between individual components has been identified as a frequent bottleneck in cloud native applications, especially those deployed on serverless platforms.
We propose F3, a file system designed to optimize data exchange in serverless platforms. F3 introduces new methods for handling ephemeral data and modifications to a serverless-scheduling algorithm so that data-locality is considered when scheduling serverless actions. These changes help to adapt existing file-based storage options to modern, cloud native applications and use cases.
By introducing new methods for handling ephemeral data, F3 makes a tradeoff between durability and performance. It does so by using data stores with higher performance that are less durable, for storing ephemeral data passed between application components. We plan to further explore how lower durability storage can be used, trading off durability for (dollar) cost. We propose to develop a model that determines the durability level most appropriate for an application and its data. We plan to introduce an application architecture that utilizes this model to place data in cheaper storage while still meeting the data's durability requirements, thereby reducing overall costs to users.
Cloud native environments offer significant benefits over more traditional host virtualization based cloud environments. Many of these benefits are driven by the adoption of FaaS and SaaS platforms that provide features such as on-demand computing, fine grained resource allocation and billing, and quick and easy deployments. Therefore, it is our thesis that to fully realize these benefits, new performance measurement techniques and efficiency approaches to data handling are required.
Dates
Thursday, January 04, 2024 - 01:00pm to Thursday, January 04, 2024 - 03:00pm
Location
NCS 220
Event Description
Event Title
Ph.D. Proposal Defense, Meyer Alex Merenstein: 'Moving Beyond Host Based Virtualization: New Techniques for Performance Measurement and Data Management in Cloud Native Environments'