Dates
Thursday, March 03, 2022 - 03:30pm to Thursday, March 03, 2022 - 04:30pm
Location
Zoom - contact events@cs.stonybrook.edu
Event Description

Title: Towards efficient and performant Machine Learning Systems

Abstract: Deep Neural Nets (DNNs), trained over large data-sets, are extensively used in ML applications, including image classification, language translation and modeling. For training DNNs, first, input data-sets are pre-processed according to the requirements of the model and training framework, and then the DNN model is trained on the processed dataset using training frameworks such as TensorFlow. DNN training is typically run on machines equipped with expensive GPUs. Because of the high compute, memory and storage requirements of the DNN-training jobs, it is crucial to use expensive resources such as GPUs, DRAM, etc. as efficiently as possible. Owing to the high variability in the nature of training jobs and variety of hardware resources being available, inefficiencies can arise from different sources.
Provisioning the GPU resources for DNN training is not straightforward. Commodity GPUs today have varying memory and compute capacities with a range of price points. Given an arbitrary DNN, it is not obvious which GPU instance should be employed to minimize the DNN training time and/or cost. Employing regression modeling and empirical analysis techniques, we design Ceer to optimally select the type and number of GPUs needed for a given training time and/or cost requirements. Ceer can decide, for example, if using multiple instances of a cheaper GPU is more cost- and/or time-effective as compared to using a single instance of an expensive GPU. Additionally, owing to the large memory footprint, modern DNNs often require splitting the model across multiple GPUs, a practice referred to as model parallelism. The key challenge for model parallelism is to efficiently and effectively partition the DNN model across GPUs to avoid communication overheads between GPUs while maximizing the GPU utilization. We develop Pesto, which optimizes the model placement and scheduling at the fine-grained operation level to minimize inter-GPU communication while maximizing the opportunity to parallelize the model execution across multiple GPUs. By carefully formulating the problem as an integer program, Pesto provides fast and near-optimal model placement and scheduling. GPU underutilization can certainly occur even when employing existing state of the art techniques (including Ceer and Pesto). This is due to the fact that commodity GPUs have fixed memory and compute capacities while different DNN training jobs have very different resource requirements. Similar behavior is observed for DNN time-sensitive inference jobs as well. We are developing G-Share, a work in progress, which allows multiple jobs to share GPU resources to increase efficiency of GPU resources, without violating SLOs for time-sensitive jobs.

In addition to compute and memory, DNN jobs also require significant I/O resources, especially for preprocessing. Data preprocessing frameworks, such as Apache Beam, are extensively used for preprocessing data sets, which are then used as input for DNN training. Data preprocessing is a resource intensive task as it typically processes hundreds of GBs of data. As part of my ongoing collaboration with Google, we are working on designing SmartCache, an ML based caching tool which aims to increase throughput of data preprocessing pipelines. We are working on devising techniques to learn various parameters e.g. I/O rate, size and lifetime of files etc. for all the files involved in the data pre-processing pipeline. The goal of SmartCache is to use caching resources efficiently by incorporating these learnt parameters while devising caching policies.

It is our thesis that present day DNN deployment pipelines employ expensive resources e.g., GPUs, Caches, etc. inefficiently. Sources for these inefficiencies are highly varied. Various analytical and empirical techniques can be employed to significantly improve efficiency of these jobs which ultimately translates to noteable cost-savings around the globe.

Event Title
Ph.D. Proposal Defense, Ubaid Ullah Hafeez: 'Towards efficient and performant Machine Learning Systems'