eCommons

 

EFFICIENT RESOURCE MANAGEMENT OF CLOUD NATIVE SYSTEMS

dc.contributor.authorZhang, Yanqi
dc.contributor.chairDelimitrou, Christinaen_US
dc.contributor.committeeMemberAlvisi, Lorenzoen_US
dc.contributor.committeeMemberSuh, Gookwon Edwarden_US
dc.date.accessioned2024-01-31T21:20:17Z
dc.date.available2024-01-31T21:20:17Z
dc.date.issued2023-05
dc.description.abstractCloud native architecture has been a prevailing trend and is widely adopted by major online service providers including Netflix, Uber and WeChat. It enables applications to be structured as loosely-coupled distributed systems that can be developed and managed independently, and provide different programming models, namely microservice and serverless, to accommodate different user requirements. Specifically, microservices are a group of small services that collectively perform as a complete application. Each microservice implements a web server that handles specific business logic, and is usually packaged in a container that encapsulates its own runtime and dependencies. Microservice containers typically live for a long time and scale up or down to cope with load fluctuations as per user-specified policies. Serverless provides a further simplified approach to application development and deployment. It allows users to upload their application code as functions, without the need for explicit provisioning or management of containers, through an event-driven interface. Serverless containers are typically short-living ’one-off’ containers handling a single request at a time. The billing of serverless is fine-grained and users only pay for the resources consumed by actual function execution. Despite the popularity of cloud native systems, managing their resources efficiently is challenging. Cloud native applications consist of many component services with diverse resource requirements, posing a greater challenge compared to traditional monolithic applications. Furthermore, the backpressure effect caused by inter-service connections also complicates resource management. Lastly, although cloud-native relives users from the burden of infrastructure management, cloud providers still need to provision and pay for the infrastructure to host cloud native applications, which incurs high cost. This dissertation aims to tackle the challenge of efficient resource management for cloud-native systems and proposes three resource managers. First, we present \textbf{Sinan}, a machine learning (ML)-driven and service level agreement (SLA)-aware resource manager for microservices. Sinan uses a set of validated ML models to learn the per-service resource requirements , taking into account the effects of inter-service dependencies. Sinan's ML models predict the end-to-end latency of a given resource allocation, and the resource manager then chooses the optimal resource allocation that preserves the SLAs, based on the predictions. Sinan highlights the importance of a balanced training dataset that includes an equal share of SLA violations and satisfactions, for the effectiveness of ML models. Additionally, Sinan demonstrates that the system is flawed if the training dataset is dominated by either SLA satisfaction or violation. In order to obtain a balanced training dataset, Sinan explores different resource allocations with an algorithm inspired by multi-arm bandit (MAP). Although Sinan outperforms traditional approaches such as autoscaling, it requires a lengthy exploration process and triggers a large number of SLA violations, hindering its practicality. Furthermore, the ML models are on the critical path of resource management decisions, limiting the speed and scalability of the system. To address these limitations, we further propose \textbf{Ursa}, a lightweight and scalable resource management framework for microservices. By investigating the backpressure-free conditions, Ursa allocates resources within the space that each service can be considered independent for the puropose of resource allocation. Ursa then uses an analytical model that decomposes the end-to-end latency into per-service latency, and maps per-service latency to individually checkable resource allocation threshold. To speed up the exploration process, Ursa explores as many independent microservices as possible across different request paths, and swiftly stops exploration in case of SLA violations. Finally, in order to reduce the infrastructure provisioning cost of cloud-native systems, we propose to leverage harvested resources in datacenter, which cloud providers provide at a massive discount. Orthogonal to the first two parts of the thesis which aim to reduce operation cost by providing the minimum amount of resources that do not compromise performance, this part aims to achieve cost reduction by using cheaper but less reliable resources. We use serverless as the target workload, and propose to run serverless platforms on low-priority Harvest VMs that grow and shrink to harvest all the unallocated CPU cores in their host servers. We quantify the challenges of running serverless on harvest VMs by characterizing the serverless workloads and Harvest VMs in production. We propose a series of policies that uses a mix of Harvest and regular VMs with different tradeoffs between reliability and efficiency, and design a serverless load balancer that is aware of VM evictions and resource variations in Harvest VMs. Our results show that adopting harvested resources improves efficiency and reduces cost significantly, and request failure rate caused by Harvest VM evictions is marginal.en_US
dc.identifier.doihttps://doi.org/10.7298/f74c-8b55
dc.identifier.otherZhang_cornellgrad_0058F_13532
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:13532
dc.identifier.urihttps://hdl.handle.net/1813/114181
dc.language.isoen
dc.titleEFFICIENT RESOURCE MANAGEMENT OF CLOUD NATIVE SYSTEMSen_US
dc.typedissertation or thesisen_US
dcterms.licensehttps://hdl.handle.net/1813/59810.2
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Electrical and Computer Engineering

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_cornellgrad_0058F_13532.pdf
Size:
10.59 MB
Format:
Adobe Portable Document Format