Dates
Wednesday, August 10, 2022 - 11:00am to Wednesday, August 10, 2022 - 01:00pm
Location
Zoom - contact events@cs.stonybrook.edu for more information.
Event Description

Abstract:
Due to their extensive parallelism and energy efficiency, GPU-based HPC clusters are drawing an increasing number of users and developers. However, with the increasing variety of GPUs available, it is impractical to write separate code that is appropriate for each target system for a specific HPC application. Another challenging feature of a GPU, besides portability, is its programmability. Programming in native languages, like CUDA, is extremely difficult for an application developer because they depend heavily on in-depth knowledge of the model and the underlying architecture. Furthermore, because the physical memory for the CPU and GPU are different, they must ensure data synchronization between the two. Identifying suitable code areas in a legacy application to offload and explicitly handling data between hosts and devices are other issues with GPU offloading. The best option for an application developer is to use directive-based parallel programming models, like OpenMP. Nevertheless, even with OpenMP, the developer must choose from among a number of strategies for offloading the kernel to a GPU. Therefore, once it has been decided that the code should be offloaded to a GPU, the application developer will greatly benefit from a tool that will help them in making decisions about what data to offload and what approach to use when offloading code to the GPU.

In this thesis, I present a first of its kind compiler tool that takes a C code as input, recognizes an OpenMP parallel loop, and then offers several kernel variants for GPU offloading. Using a compile time cost model, it then statically identifies the kernel that is best suited for GPU offloading. This tool is divided into three modules. We chose to design our tool using a modular approach so that we can replace a module as needed without affecting the other modules.


The first is the Kernel Analysis module, which detects and analyzes an OpenMP kernel and suggests several variants, by applying various potential code level transformations, for offloading that kernel to a GPU. The main objective behind this tool is to maintain the portability of OpenMP.  As such to maintain the portability, I am performing all the analysis and data extraction on the AST itself and not going down the level in the compiler.  Moreover, this module performs an essential verification to ensure that the OpenMP transformation is accurate. The second module is a Compiler Cost Model, that estimates the cost of executing the original code and the different offloading code variants. Modern compilers typically use analytical cost models for this purpose. Regrettably, creating a cost model for a compiler optimization, particularly GPU offloading, is an extremely challenging task that requires a lot of effort from a compiler engineer. Additionally, it seems implausible to try to develop an analytical cost model that is portable across various GPU architectures. Recently, cost models have been successfully developed using Machine Learning techniques for a variety of compiler optimization issues. However, a cost model for GPU offloading is still lacking in this situation. So, in this module, I define COMPOFF, a cost model that statically calculates the Cost of OpenMP OFFloading using neural networks. Our initial reports demonstrate that using COMPOFF, one can estimate the cost of offloading OpenMP kernels with an accuracy ranging from 95% to 99%. This accuracy is independent of different compilers and GPU architectures. In the third module we modify the original source code using the analysis and prediction from the other modules and return newly generated code that supports GPU offloading. 

 

According to our preliminary findings, this framework can help HPC researchers and compiler developers port legacy HPC codes to the upcoming heterogeneous computing environment.

Event Title
Ph.D. Research Proficiency Presentation: Alok Mishra, 'Program Transformation for Automatic GPU-Offloading using OpenMP'