NSM HPC Research Week

Schedule Summary

November 23

Clicking on a link in this table would take you directly to the meeting.

09:00 R. Govindarajan Compiling for Decision Trees

09:30 Krishna Nandivada Challenges in Realizing Efficiency in Parallel Programs

10:00 Jyothi Vedurada Unlocking CPU-GPU Heterogeneous System Performance through Compiler Optimizations and Parallelization

10:30 Smruti Sarangi JASS: A Tunable Checkpointing Systems for NVM-based Systems

11:00 C Unnikrishnan Programming Models for Graph Analytics

11:30 Sharad Sinha HPC Research at IIT Goa

12:00 Kalyan T V An Overhaul of Computer System Stack for Graph Applications

12:30 John Jose High Performance Architectures- Need for Co-Design of Caches and Interconnects

13:00 Sunitha Manjari Role of HPC in Accelerating Biology

13:30 Lunch Break

14:00 Vishwesh Jatala High Performance Distributed Graph Neural Networks

14:30 Sathish Vadhiyar Divide-and-Conquer Paradigm for Asynchronous CPU-GPU Computations in Graph and AI/ML Applications

15:00 Soumyajit Dey GPGPU optimizations for HPC workloads

15:30 Preeti Malakar A Glimpse of ML for HPC

16:00 Kishore Kothapalli Recent Progress in Parallel Dynamic Graph Algorithms

16:30 Biswabandan Panda Plenty of rooms at the bottom: Microarchitecture for performance

Our Esteemed Speakers

Keynote Address on November 23 at 09:00
R. Govindarajan IISc Bangalore	Compiling for Decision Trees Author Bio: Prof. R. Govindarajan is a well-known senior faculty member in the Indian academia. He is a Professor at IISc Bangalore in CSA and SERC Departments. Prior to that, he was a faculty member at the Memorial University of New Foundland, Canada and a researcher at McGill. He earned PhD from IISc. His research interests are in CPU and GPU Architecture as well as the associated software. He has graduated several scholars on these topics. He has also completed several research projects with various companies such as AMD, Microsoft, and Intel. He has also received unrestricted faculty grants from IBM and NVIDIA. He was a visiting faculty at Arizona State University and Univerity of Delaware. He has served in numerous program committees, and as associate editor for journals such as JPDC, ToPC, TACO, Micro, and CAL. He served as a Guest Editor for Special Issue of JPDC on Cluster and Network-based Computing. He was also a General Co-Chair for e-Science 2007, PPoPP 2010, and IPDPS 2015.

November 23 at 09:30
Krishna Nandivada IIT Madras	Challenges in Realizing Efficiency in Parallel Programs Multicore systems have taken the computing world by a storm, with the ever-increasing amount of parallelism in the hardware, and the continuously changing landscape of parallel programming. The programmers are expected to think in parallel and express the program logic (ideal parallelism), using parallel languages of their choice. However, a parallel program is not guaranteed to be efficient just because it is parallel. To realize efficient parallel programs, one has to address challenges in multiple dimensions: (i) writing efficient programs, (ii) analyzing parallel programs for efficiency, (iii) optimizing parallel programs. This problem becomes challenging as many of the traditional assumptions about serial programs do not hold in the context of parallel programs. In this talk, we will (i) discuss the importance of recognizing parallel programming as an important and distinct step, and (ii) go over the idea of ideal and useful parallelism and some of our experiences in bridging the gap between them. Author Bio: V. Krishna Nandivada is currently a Professor and Head of the department of Computer Science and Engineering at IIT Madras. He is a senior member of ACM and IEEE. Before joining IIT Madras in 2011, he spent nearly 5.5 years at IBM India Research Lab (Programming Technologies and Software Engineering group). Prior to starting his PhD, he was associated with Hewlett Packard. He holds a BE degree from REC (now known as NIT) Rourkela, ME degree from IISc Bangalore, and PhD degree from UCLA. His research interests are Compilers, Program Analysis, Programming Languages, and Multicore systems.

November 23 at 10:00
Jyothi Vedurada IIT Hyderabad	Unlocking CPU-GPU Heterogeneous System Performance through Compiler Optimizations and Parallelization Efficiently harnessing the computational power of CPU and GPU in heterogeneous systems is an essential task for achieving high program performance. This talk delves into two key research areas from the Indian Institute of Technology Hyderabad (IITH) aimed at overcoming performance bottlenecks and optimizing the use of massively parallel GPUs. (1) Compiler optimization: By introducing clever static analysis techniques to efficiently manage global synchronization statements, we aim to overcome the hindrance caused by synchronous data transfer operations and synchronization barriers in CPU-GPU heterogeneous programs, maximizing program performance. (2) Parallelization: We provide an efficient Approximate Nearest Neighbour search implementation on GPUs to improve the throughput (Queries Per Second) up to >50,000 without compromising on the recall on high-dimensional billion scale data (coming from deep learning-based embeddings). Author Bio: Jyothi Vedurada is an assistant professor at the Dept. of Computer Science and Engineering, IIT Hyderabad. Her research interests are program analysis, program understanding, automated concurrency testing, and high-performance computing. Prior to joining IIT Hyderabad, she was a post-doctoral researcher at Microsoft Research Lab, Bangalore. She received PhD (+M.Tech) from IIT Madras, supported by the TCS PhD fellowship. Before that, she worked as a Software Engineer at Hewlett Packard, Chennai.

November 23 at 10:30
Smruti Sarangi IIT Delhi	JASS: A Tunable Checkpointing Systems for NVM-based Systems Checkpointing (or snapshotting) a system’s state has always been a problem of great interest and has found a lot of use in ensuring system reliability, record-replay debugging, job migration and running high-throughput transaction systems. In the last few years ultra-fast hardware-assisted NVM-based checkpointing schemes have come up that can collect incremental full-system checkpoints in milliseconds. Unfortunately, such systems have large overheads in terms of their write amplification (increased number of writes). This, in turn, seriously reduces the reliability and lifetime of NVM devices. We propose the first tunable scheme in this space, JASS, where given a checkpoint latency (CL), we near-optimally minimize the write amplification (WA). This allows us to run parallel programs in a disciplined fashion. To realize this goal, we propose many novel hardware along the way such as a rigorous method of flushing pre-checkpoint messages in the NoC, a novel DRAM scrubber and locality predictor, and a control-theoretic algorithm to guarantee a CL while minimizing the WA. We reduce WA by 35-96% as compared to the nearest state-of-the-art competing method and improve performance of PARSEC benchmarks by 19.4%. Author Bio: Smruti R. Sarangi Prof. Smruti Ranjan Sarangi is a Professor in the Computer Science and Engineering Department at IIT Delhi with a joint appointment in the Department of Electrical Engineering. He primarily works on computer architecture, EDA and operating systems. His research areas specifically cover EDA, multicore processors, cyber-security, emerging technologies, networks on chip, operating systems for parallel computers and parallel algo- rithms. Dr. Sarangi obtained his Ph.D in computer architecture from the University of Illinois at Urbana Champaign(UIUC), USA in 2006, and a B.Tech in computer science from IIT Kharagpur in 2002. He has filed five US patents, seven Indian patents and has published 125 papers in reputed international conferences and journals. He is the author of two popular textbooks in computer architecture for UG and PG students, respectively. He is a member of the IEEE and ACM.

November 23 at 11:00
C Unnikrishnan IIT Palakkad	Programming Models for Graph Analytics In this talk Dr. Unnikrishnan Cheramangalath will cover graph analysis and deep learning techniques used in the recent past for graph analysis. He will give a good picture on challenges in graph analysis, and a summary of work on graph analysis in Indian and foreign universities. He will also briefly mention works on graph analysis happening at IIT Palakkad. Author Bio: C. Unnikrishnan is an assistant professor at IIT Palakkad. He completed PhD from IISc. Prior to joining IIT Palakkad, he worked as a Research Fellow at SUTD, Singapore. His research interests are Compilers, HPC, and Machine Learning. He is an author of the book on Distributed Graph Analytics and has served on multiple program committees.

November 23 at 11:30
Sharad Sinha IIT Goa	HPC Research at IIT Goa This talk will introduce the audience to some of the HPC related research at IIT Goa. It will also cover a couple of research activities in some more detail. These research activities are related to application workload characterization etc. Author Bio: Dr. Sharad Sinha is an Associate Professor of Computer Science and Engineering at IIT Goa. His research interests are in reconfigurable computing, computer architecture, FPGAs and embedded systems. He has received Best Paper Awards at ICCAD 2022 and ICCAD 2017 and a Best Paper Award Nomination at CASES 2018 and FCCM 2019. He is also the PI of the NSM Nodal Center and the Drones Center at IIT Goa.

November 23 at 12:00
Kalyan T V IIT Ropar	An Overhaul of Computer System Stack for Graph Applications Author Bio: Venkata Kalyan Tavva is an assistant professor in the department of computer science, IIT Ropar. Prior to this, he was a Hardware Performance Architect in the POWER systems performance team, part of India Systems Development Lab, IBM India Pvt. Ltd. Bangalore. His research interests lie in memory-hierarchy exploration, alternate computing techniques, and low-power/energy-efficient designs. He has several publications and patents to his name. Dr. Kalyan received his Ph.D. from the Indian Institute of Technology Madras. He is a member in IEEE and ACM.

November 23 at 12:30
John Jose IIT Guwahati	High Performance Architectures- Need for Co-Design of Caches and Interconnects Author Bio: Dr. John Jose is an Associate Professor in Department of Computer Science & Engineering, Indian Institute of Technology Guwahati, where he joined as an Assistant Professor in 2015. He completed his Ph.D degree from Indian Institute of Technology Madras in the field of computer architecture. He is the recipient of the prestigious Qualcomm Faculty Award 2021. He is also serving as the Vice-Chair of IEEE India Council. His research group in Multicore Architecture and Systems Lab at IITG explores the domain of network on chips, cache management techniques for large multicore systems, non-volatile memories, hardware security, domain specific hardware accelerators and disaggregated storage systems. He is the associated editor for IEEE-Embedded System Letter Journal. He has over 35 IEEE & ACM peer reviewed conference publications, over 15 ACM & IEEE transactions papers as well as Springer and Elsevier journal papers to his credit. He is a reviewer for many national and international peer reviewed journals and member of technical program committee and organizing committee for many IEEE/ACM national and international conferences. He is the investigator for several R&D projects under DST and MeitY.

November 23 at 13:00
Sunitha Manjari CDAC	Role of HPC in Accelerating Biology Author Bio: Sunitha Manjari Kasibhatla is working with High Performance Computing-Medical and Bioinformatics Applications Group of Centre for Development of Advanced Computing (C-DAC), Pune, India as ‘Associate Director’. She has played active role in development of tools that exploit high performance compute clusters like GenoVault, Anvaya, GAMUT and GenoPipe. Her research interests include population-genomics and comparative genomics. She has been part of three Indo-UK collaborative projects pertaining to comparative genomics of Salmonella serovars, SNP analysis of Maerek’s disease resistant and susceptible chicken lines and transposon sequence analysis of Mycobacterium bovis. She has 30 publications in peerreviewed journals and was awarded Bioclues Innovation in Research and Development award-2012 instituted by BioClues.org. She has Master’s in Biochemistry with an Advanced Diploma in Bioinformatics and PhD in Virus Bioinformatics from Savitribai Phule Pune University.

Lunch Break on November 23 at 13:30

November 23 at 14:00
Vishwesh Jatala IIT Bhilai	High Performance Distributed Graph Neural Networks Author Bio: Dr. Vishwesh Jatala is an Assistant Professor in the Department of CSE at Indian Institute of Technology Bhilai. He received his Ph.D. from the Department of CSE at the Indian Institute of Technology Kanpur (IIT Kanpur) and B.Tech from the Department of CSE at Visvesvaraya National Institute of Technology Nagpur (VNIT Nagpur). Prior to joining IIT Bhilai, he was a Postdoctoral Fellow at the University of Texas at Austin, USA. He has 2 years of industrial experience after his B.Tech. The research works of Dr. Jatala lie in the areas of Graphics Processing Units (GPUs), High-Performance Computing, Parallelization, Graph Neural Networks, Graph Analytics. His research work spans across a broad spectrum of system stack (architecture, compiler, and runtime systems) and addresses the two key challenges of GPU design: (1) improve throughput and energy efficiency of GPUs, and (2) improve the performance of the applications as well as ease programming effort for developing applications targeted for both single GPU and distributed multi-GPU platforms. His research work was published in several international conference proceedings and was awarded Best Paper Nomination in PACT 2019, Student Innovation Award in HPEC 2019, and SRS Best Poster in HiPC 2017. He is a recipient of TCS fellowship during his Ph.D. and Institute Medal for academic excellence in B.Tech, CSE, at VNIT Nagpur.

November 23 at 14:30
Sathish Vadhiyar IISc Bangalore	Divide-and-Conquer Paradigm for Asynchronous CPU-GPU Computations in Graph and AI/ML Applications As HPC has entered the Exascale computing era, large systems are built with many nodes of CPUs harness and GPUs. It is important to develop scalable algorithms that can efficiently multiple CPU and GPU devices of multiple nodes. Popular parallelism models for Graph and AI/ML applications are limited in use of both the CPU and GPU cores and also involve heavy synchronizations at periodic intervals. However, developing asynchronous algorithms and minimizing/avoiding global synchronizations become very important to provide high performance and scalability for very large number of computing cores in Exascale systems. In this talk, I will cover about our work on developing asynchronous algorithms using the good-old divide-and-conquer paradigm in the domains of Graph and AI/ML applications for CPU-GPU hybrid executions. We have demonstrated our model on multiple graph applications including community detection, Borouvka's MST, graph coloring, triangle counting and on large AI model of VGG-net using large CIFAR dataset. Our algorithms reduce global synchronization by multiple folds in the existing Bulk Synchronous Parallel (BSP) model in graph applications and the costly allreduce/allgather operations in the data parallelism paradigm in the deep learning applications. In both cases, we achieved 20-70% speedup over the existing widely-used models. Author Bio: Sathish Vadhiyar is Professor in the Department of Computational and Data Sciences and Chair of Supercomputer Education and Research Centre, Indian Institute of Science. He obtained his B.E. degree in the Department of Computer Science and Engineering at Thiagarajar College of Engineering, India in 1997 and received his Masters degree in Computer Science at Clemson University, USA in 1999. He graduated with a PhD in the Computer Science Department at University of Tennessee, USA in 2003. His research areas are in HPC application frameworks including multi-node and multi-device programming models and runtime strategies for irregular applications including graph applications and AMR applications, performance characterization and scalability studies, processor allocation, mapping and remapping for large scale executions, middleware for production supercomputer systems, and fault tolerance for parallel applications. He has also worked with applications in climate science and visualization in collaboration with researchers working in these areas. Dr. Vadhiyar is a senior member of IEEE, professional member of ACM, and has published papers in peer-reviewed journals and conferences. He was the program co-chair of HPC area in HiPC 2022, chair of senior member award committee of ACM, was an associate editor of IEEE TPDS, and served on the program committees of conferences related to parallel and grid computing including IPDPS, IEEE Cluster, CCGrid, ICPP, eScience and HiPC. Dr Vadhiyar is also an investigator in the National Supercomputing Mission (NSM), a flagship project to create HPC ecosystem in India, where he manages the R&D projects related to HPC in the country.

November 23 at 15:00
Soumyajit Dey IIT Kharagpur	GPGPU optimizations for HPC workloads In the past decade, high performance compute capabilities exhibited by heterogeneous GPGPU platforms have led to the popularity of data parallel programming languages such as CUDA and OpenCL. Developing high performance parallel programming solutions using such languages involve a steep learning curve due to the complexity of the underlying heterogeneous compute devices and their impact on performance. In this talk we shall oversee 1) GPGPU optimizations described from a high level for such programming languages, 2) examples of High Performance Computing frameworks which provide high-level abstractions for easing the development of data-parallel applications on heterogeneous platforms. Author Bio: Dr. Soumyajit Dey is currently an associate professor in the Department of Computer Science and Engineering, IIT Kharagpur. He joined the department in 2013. He did his B.E. in Electronics and Telecommunication Engg. from Jadavpur University, Kolkata, India. He did his Masters and PhD degree in Computer Science and Engg. from IIT Kharagpur, India. He has published 28 peer reviewed journals and 56 International conference papers, some of them in top venues (http://cse.iitkgp.ac.in/~soumya/pub.html). He leads the `High Performance Real-time Computing Laboratory (HiPRC) in Computer Science and Engg. Dept, IIT Kharagpur, India. His research interests include 1) Autonomous Trustworthy Cyber Physical Systems (CPS) design, 2) Formal Methods, 3) Real time scheduling, 4) GPGPU optimizations. He has served as reviewer in 10 IEEE/ACM transactions and as PC member in DAC, RTSS, DATE, ICCPS, VLSI, VDAT, DSD, SPACE conferences. He has also organized special sessions on CPS in DSD 2020/21/22/23, DATE 2021. He is the winner of best design award in VLSI 2006 and has an honourable mention in VLSI 2019.

November 23 at 15:30
Preeti Malakar IIT Kanpur	A Glimpse of ML for HPC Machine learning (ML) is everywhere. What role does ML play in HPC? We will present two research problems where we used machine learning/deep learning. (1) Tuning parallel I/O parameters using active learning - we will delve into understanding the challenges of tuning performance knobs of parallel I/O (2) Predicting cyclogenesis using deep learning (DL) - we will discuss how we can detect a cyclone during its formation using DL. We will conclude with limitations of machine learning in the context of HPC. Author Bio: Preeti Malakar is an Assistant Professor in the Department of Computer Science and Engineering, Indian Institute of Technology Kanpur. Prior to this, she worked at the Argonne National Laboratory, USA. She graduated (Ph.D.) from the Department of Computer Science and Automation, Indian Institute of Science Bangalore. Her research interests include scalable parallel communications, modeling and optimizing scientific workflows, parallel I/O, and application performance modeling/analysis. She regularly serves on the program committees of HPC conferences.

November 23 at 16:00
Kishore Kothapalli IIIT Hyderabad	Recent Progress in Parallel Dynamic Graph Algorithms With real-world graphs usually evolving over time, efficient parallel algorithms for dynamic graph computations are gaining research attention. There has been considerable progress in this direction in the recent years too. This talk will cover some of these developments and outline some of the important open problems in this domain. Author Bio: Kishore Kothapalli is a professor and Dean (Academics) at IIIT Hyderabad. He completed PhD from John Hopkins University. His research interests are graph theory, network security, and distributed and parallel algorithms. Kishore has served as a PC member for numerous conferences and as a General Co-chair for HiPC 2020. He is also an Associate Editor for TOPC journal. He recently crafted a book on Engineering Parallel Graph Algorithms.

November 23 at 16:30
Biswabandan Panda IIT Bombay	Plenty of rooms at the bottom: Microarchitecture for performance This talk will explain why microarchitecture plays an essential role in our computing world, keeping application developers, compiler writers, and OS designers in mind. Then I will talk about microarchitects (a.k.a. my mentees) and their recent journey on latency hiding techniques keeping system performance and energy in mind. No, no, and no. A microarchitect does not build tiny houses where you can live in. Author Bio: Dr. Biswa is a faculty member at CSE, IIT Bombay. He completed PhD from CSE, IIT Madras. Biswa's well-known microscopic contributions are the state-of-the-art high performing cache compressors and multi-level hardware data prefetchers. He is one of the recipients of Qualcomm Faculty Award 2022 and Google India Research Award 2022.