Schedule Summary
November 24
Clicking on a link in this table would take you directly to the meeting.
09:00  S Sundar  A shockcapturing meshless geometric conservation weighted least square method for solving shallow water equations 
09:30  Sashi Kumaar Ganesan  GPUAccelerated Parallel Algebraic Multigrid Solver 
10:00  Panchatcharam M  GPU Accelerated Computing for Cancer Treatment 
10:30  Aditya Konduri  An overview of scalable asynchronous PDE solvers 
11:00  Nagaiah Chamakuri  Challenges for large scale simulation of cardiac electrophysiology 
11:30  Soumyendu Raha  Sparsification of ReactionDiffusion Dynamical Systems in complex networks 
12:00  Sathish Vadhiyar  Pipelined Preconditioned Conjugate Gradient Methods for Distributed Memory Architectures 
13:00  Lunch Break  
14:00  Deepak Subramani  Onboard routing of autonomous underwater vehicles: From PDEs to Deep Learning 
14:30  Ratikanta Behera  Tensor Computations with Applications 
15:00  Jim Thomas  Modeling fluid dynmaics of the world's oceans 
15:30  Phani Sudheer Motamarri  Towards fast and accurate quantum modeling of materials on extremescale architectures for accelerated materials discovery 
16:00  Kandappan  Hierarchical lowrank structures on distributed systems 
Our Esteemed Speakers
Keynote Address on November 24 at 09:00 
S Sundar IIT Madras and NIT Mizoram  A shockcapturing meshless geometric conservation weighted least square method for solving shallow water equations The shallow water equations are numerically solved to simulate free surface flows in twodimension (2D). The convective flux term in the shallow water equations needs to be discretized using a Riemann solver to capture shocks and discontinuity for certain flow situations such as hydraulic jump, dambreak wave propagation, or bore wave propagation. The approximate Riemann solver can capture shocks and is popular for studying open channel flow problems with the traditional mesh based methods. However, meshless methods can work on structured and unstructured grids and even for points irregularly distributed over a computational domain. Moreover, approximate Riemann solvers is not reported to be implemented within the framework of meshless methods for solving the shallow water equations. Therefore, we have proposed a numerical method, namely, a shockcapturing meshless solver for the shallow water equations for simulating 2D flows on a highly variable topography even in presence of shocks and discontinuity. The HLL (HartenLaxVan Leer) Riemann solver in the proposed meshless method is used to evaluate convective flux. The spatial derivatives in the shallow water equations and the reconstruction of conservative variables to calculate flux terms are computed using a geometric conservation weighted least square (GCWLS) approximation. The proposed meshless method is tested for a range of numerically challenging problems and laboratory experiments. Author Bio: S. Sundar is from Indian Institute of Technology Madras, who is currently the Director of NIT Mizoram. He is a Professor of Mathematics and he was Head of the Department of Mathematics, IIT Madras during the period 2017 2020. He is the DAAD (German Academic Exchange Service) Research Ambassador since 2018. He is Distinguished Alumni of TU Kaiserslautern, Germany and Alumni Ambassador of the City Kaiserslautern, Germany. He was the Chairman, JEE (Advanced) 2015, IIT Madras and prior to that he was the Chairman, HSEE 2014, IIT Madras. He was one of the leads in bringing Joint Seat Allocation Portal for IITs and NITs in the year 2015. He is a Member of Program Advisory Committee (Mathematical Sciences), DSTSERB. Prof.Sundar was the Member of Faculty Council, IIT Madras Research Park during the period 2012 – 2020. He is the Associate Editor of International Journal of Advances in Engineering Sciences and Applied Mathematics, Springer. He is also Editorial Member of Journal of Indian Mathematical Society and Journal of Indian Academy of Mathematics. He was the key in establishing Centre of Excellence in Computational Mathematics and Data Science, Department of Mathematics, IIT Madras supported under the Institute of Eminence, GoI. His area of research includes Numerics for Partial Differential Equations (PDEs), Mathematical Modeling and Numerical Simulation. He has on his credit over 70 peer reviewed research publications. He has guided 17 PhDs and currently 8 research scholars are working under his guidance. He has guided over 150 MTech research projects and currently over 10 MTech students are pursuing their research projects. He has active contributions to Indian Academy of Mathematical Modeling and Simulation as a Fellow, His rich experience as visiting professor to various Universities across the globe shows his research potentiality. His interest for recent technologies in the field of mathematical modelling is reflected through his deep collaboration with some of the top schools of technical universities in Germany and in general, across the globe. 

November 24 at 09:30 
Sashi Kumaar Ganesan IISc Bangalore  GPUAccelerated Parallel Algebraic Multigrid Solver Hybrid CPUGPU algorithms for Algebraic Multigrid methods (AMG) are presented to effectively utilize both CPU and GPU resources. Specifically, a hybrid AMG framework is developed, focusing on minimal GPU memory usage while maintaining performance on par with GPUonly implementations. This hybrid AMG framework can be tuned to operate with significantly lower GPU memory, enabling the solution of larger algebraic systems. By combining this hybrid AMG framework as a preconditioner with Krylov Subspace solvers like Conjugate Gradient and BiCG methods, a comprehensive solver stack is created to address a wide range of problems. The performance of the proposed hybrid AMG framework is analyzed across an array of matrices varying in properties and size. Additionally, the performance of these CPUGPU algorithms is compared with GPUonly implementations, demonstrating their considerably lower memory requirements. Author Bio: Sashi Kumaar Ganesan is a Professor and Chair, Department of Computational and Data Sciences (CDS), Indian Institue of Science (IISc), Bangalore. He joined IISc in 2011 as an Assistant Professor. Before joining the institute, he was a Research Associate at Imperial College London and an AlexandervonHumboldt fellow at WIAS Berlin. He received Ph.D. from OttovonGuericke University, Germany. His research group, focuses on Finite element analysis, Scientific Computing & Machine Learning and HighPerformance Computing. He is also a founder of , Zenteiq Edtech Pvt. Ltd., a deep tech startup incubated at FSID, IISc. 

November 24 at 10:00 
Panchatcharam M IIT Tirupati  GPU Accelerated Computing for Cancer Treatment In this presentation, our focus revolves around a pivotal aspect of cancer treatment protocols, emphasizing the imperative to streamline computational processes in order to enhance the efficiency of treatment outcome predictions. The utilization of stateoftheart Graphics Processing Units (GPUs) stands as a transformative solution, enabling the reduction of prediction time from the conventional 6 hours to a mere 2 minutes for a onehour treatment protocol. The primary objective entails the development of a rapid finite element solver and Joule heat solver, paving the way to accurately forecast both the quantity and volume of deceased cells in the vicinity of the tumor. This technological advancement not only accelerates the predictive analytics but also holds great promise for optimizing cancer treatment strategies. Author Bio: Panchatcharam Mariappan is a faculty member at IIT Tirupati. He completed PhD from IIT Madras and TU Kaiserslautern, Germany. His research interests include Numerics of PDE, CFD, GPU Computing and Heat Transfer. 

November 24 at 10:30 
Aditya Konduri IISc Bangalore  An overview of scalable asynchronous PDE solvers Numerical simulations of physical phenomena and engineering systems, governed by nonlinear partial differential equations, demand massive computations with extreme levels of parallelism. Current stateoftheart simulations are routinely performed on hundreds of thousands of processing elements (PEs). At an extreme scale, it is observed that data movement and its synchronization pose a bottleneck in the scalability of solvers. Recently, an asynchronous computing method that relaxes communication synchronization at a mathematical level has shown significant promise in improving the scalability of PDE solvers. In this method, communication synchronization between PEs due to halo exchanges is relaxed, and computations proceed regardless of communication status. It was shown that numerical accuracy of standard schemes like the finitedifferences, implemented with relaxed communication synchronization, is significantly affected. Subsequently, new asynchronytolerant schemes were developed to compute accurate solutions and show good scalability. In this talk, an overview of the status of the asynchronous computing method for PDE solvers and its applicability towards exascale simulations will be presented. The relaxation of data synchronization at a mathematical level can further leverage asynchronous parallel communication and runtime models. The coupling of asynchronytolerant schemes with such models will be discussed. Author Bio: Aditya Konduri works as an Assistant Professor in the Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru. Prior to this, he was a Postdoctoral Researcher at the Combustion Research Facility, Sandia National Laboratories, USA. His current research includes large scale simulations of turbulent combustion relevant to gas turbine and scramjet engines, design of machine learning methods for anomalous/extreme event detection in scientific phenomena, and development of scalable asynchronous numerical methods and simulation algorithms for solving partial differential equations on massively parallel computing systems. Aditya completed PhD from Texas A&M University. 

November 24 at 11:00 
Nagaiah Chamakuri IISER Trivandrum  Challenges for large scale simulation of cardiac electrophysiology The bidomain equations form the stateoftheart model of cardiac electrophysiology and describe normal or pathological propagation of the excitation wave through cardiac tissue. The bidomain model consists of a system of elliptic partial differential equations coupled with a nonlinear parabolic equation of reactiondiffusion type, where the reaction term, modeling ionic transport, is described by a set of ordinary differential equations. Since the ionic currents are described by ODEs in the tissue, the PDE part dominates the solving effort. Thus, it is not clear if commonly used splitting methods can outperform a coupled approach by maintaining good accuracy. In the first part, the results will be presented based on a comparison of the coupled solver approach with commonly used splitting methods to solve more sophisticated physiological models. In this regard, the novel memoryefficient computational technique will be demonstrated to solve the coupled systems of equations. In the second paper, we address those challenges by combining spacetime adaptive discretiza tion with dynamic load balancing for parallel computing. Author Bio: Nagaiah Chamakuri is a faculty member at IISER Trivandrum. He completed PhD from University of Magdeburg, Germany. He was a Senior Research Scientist at MaxDelbrück center, Berlin, Germany. 

November 24 at 11:30 
Soumyendu Raha IISc Bangalore  Sparsification of ReactionDiffusion Dynamical Systems in complex networks A datadriven approach for the sparsification of reactiondiffusion dynamical systems in complex networks (rather graphs induced by complex networks) is studied as an inverse problem guided by data representing the flows in the network. Model reduction techniques like proper orthogonal decomposition (POD) are utilized to make the problem computationally feasible. The network sparsification problem is mapped to a data assimilation problem on a reduced order model (ROM) space with constraints targeted at preserving the eigenmodes of the Laplacian matrix under perturbations. Approximations are proposed to the eigenvalues and eigenvectors of the Laplacian matrix subject to perturbations for computational feasibility, and a custom function is included based on these approximations as a constraint on the data assimilation framework. The resulting eigenvalues are analyzed as pseudo eigenvalues to the perturbed Laplacian matrix. As a use case, this framework has been used to sparsify neural ODENets Author Bio: Soumyendu Raha is a professor at CDS, IISc. He completed PhD from the University of Minnesota. Prior to joining IISc, he has taught at UCSB and NDSU. He has also worked in Cray and IBM. 

November 24 at 12:00 
Sathish Vadhiyar IISc Bangalore  Pipelined Preconditioned Conjugate Gradient Methods for Distributed Memory Architectures As HPC has entered the Exascale computing era, developing asynchronous algorithms, techniques for overlapping computations and communications and minimizing/avoiding global synchronizations become very important to provide high performance and scalability for very large number of computing cores. In this talk, I will cover our recent work on developing piplelined preconditioned Conjugate Gradient (CG) methods for distibuted memory systems based on the abovementioned principles for exascale computing. Traditional CG algorithm has costly allreduce operations for dot products that involve global synchronization and the subsequent operations including SpMV wait for the results of these dot products. In our work, we have developed PIPECGOATI (PIPECGOne Allreduce per Two Iterations) which reduces the number of allreduces from three per iteration to one per two iterations and overlaps it with two PCs and two SPMVs. For better scalability with more overlapping, we also developed the Pipelined sstep CG method which reduces the number of allreduces to one per s iterations and overlaps it with s PCs and s SPMVs. We compared our methods with stateofart CG variants on a variety of platforms and demonstrated that our method gives 2.15x  3x speedup over the existing methods. We developed communication overlapping CG variants for GPU accelerated nodes, where we proposed and implemented three hybrid CPUGPU execution strategies for the PIPECG method. Our experiments on GPUs showed that our methods give 1.45x  3x average speedup over existing CPU and GPUbased implementations. Author Bio: Sathish Vadhiyar is Professor in the Department of Computational and Data Sciences and Chair of Supercomputer Education and Research Centre, Indian Institute of Science. He obtained his B.E. degree in the Department of Computer Science and Engineering at Thiagarajar College of Engineering, India in 1997 and received his Masters degree in Computer Science at Clemson University, USA in 1999. He graduated with a PhD in the Computer Science Department at University of Tennessee, USA in 2003. His research areas are in HPC application frameworks including multinode and multidevice programming models and runtime strategies for irregular applications including graph applications and AMR applications, performance characterization and scalability studies, processor allocation, mapping and remapping for large scale executions, middleware for production supercomputer systems, and fault tolerance for parallel applications. He has also worked with applications in climate science and visualization in collaboration with researchers working in these areas. Dr. Vadhiyar is a senior member of IEEE, professional member of ACM, and has published papers in peerreviewed journals and conferences. He was the program cochair of HPC area in HiPC 2022, chair of senior member award committee of ACM, was an associate editor of IEEE TPDS, and served on the program committees of conferences related to parallel and grid computing including IPDPS, IEEE Cluster, CCGrid, ICPP, eScience and HiPC. Dr Vadhiyar is also an investigator in the National Supercomputing Mission (NSM), a flagship project to create HPC ecosystem in India, where he manages the R&D projects related to HPC in the country. 

Lunch Break on November 24 at 13:00 

November 24 at 14:00 
Deepak Subramani IISc Bangalore  Onboard routing of autonomous underwater vehicles: From PDEs to Deep Learning Intelligent onboard optimal routing is essential for the efficient use of autonomous marine platforms in a variety of scientific, security and humanitarian applications. In this context, we first develop a CPUbased dynamically orthogonal PDE for optimal routing. Next, we develop a GPU accelerated dynamic programming solver for stochastic environments and finally we show how the expert trajectories generated by the previous two exact solutions can be used for developing a transformerbased foundational neural model for onboard routing. Author Bio: Dr. Deepak Subramani is an Assistant Professor in the Dept. of Computational and Data Sciences at the Indian Institute of Science (IISc) in Bangalore. He obtained his Ph.D. in Mechanical Engineering and Computation from the Massachusetts Institute of Technology (MIT), USA, and B.Tech in Mechanical Engineering from IIT Madras. He works in developing AI/ML solutions for geoscience applications, uncertainty quantification, and optimal routing of autonomous vehicles. He is an expert in datadriven modeling, deep learning, scientific machine learning and scientific computing. He has won several awards throughout his career, including the IISc Award for Excellence in Teaching, Arcot Ramachandran Young Investigator Award, INSPIRE Faculty award, de Florez research award at MIT, SNAME award, GE Foundation Leader Scholar Award, and National Talent Search. He has more than 40 peerreviewed publications in top journals and international conferences. 

November 24 at 14:30 
Ratikanta Behera IISc Bangalore  Tensor Computations with Applications In the era of BIG data, artificial intelligence, and machine learning, there is a need to process multiway (tensorshaped) data. These data are mainly in the three or higherorder dimensions, whose orders of magnitude can reach billions. Large volumes of multidimensional data are a great challenge for processing and analyzing; the matrix representation of data analysis is not sufficient to represent all the information content of multiway data in different fields. In this talk, we discuss a closed multiplication operation between tensors with the concepts of transpose, inverse, and the identity of a tensor. We then discuss the application of tensor factorization to color imaging problems. Author Bio: Ratikanta Behera is a faculty member at CDS, IISc. He completed PhD from IIT Delhi. Prior to joining IISc, he was a faculty member at IISER Kolkata. His research interests are Tensor Decompositions, Neural Networks, Numerical Linear Algebra, Generalized Inverses of Tensors, Wavelets in Scientific Computing, HighPerformance Computing. 

November 24 at 15:00 
Jim Thomas TIFR  Modeling fluid dynmaics of the world's oceans Author Bio: Jim Thomas is a faculty member at International Centre for Theoretical Sciences, Tata Institute of Fundamental Research and Centre for Applicable Mathematics, Tata Institute of Fundamental Research. He received PhD in Mathematics and Atmosphere Ocean Science from the Courant Institute of Mathematical Sciences, New York University. Oliver Buhler and Shafer Smith were his PhD advisors. His research focusses on understanding fluid dynamics of the world’s oceans. He uses a combination of applied mathematical techniques, idealized mathematical models, and scalespecific numerical integration of governing equations to understand intricate details of oceanic flows. 

November 24 at 15:30 
Phani Sudheer Motamarri IISc Bangalore  Towards fast and accurate quantum modeling of materials on extremescale architectures for accelerated materials discovery Quantummechanical modeling of materials has played a significant role in determining a wide variety of material properties over the past few decades. In particular, KohnSham density functional theory (DFT) calculations, involving the computation of selfconsistent solution of a nonlinear eigenvalue problem, have been instrumental in providing many crucial insights into materials behavior, and occupy a sizable fraction of world's computational resources today. However, the stringent accuracy requirements required to compute meaningful material properties, in conjunction with the asymptotic cubicscaling computational complexity of the underlying eigenvalue problem, demand enormous computational resources for accurate DFT calculations. Thus, these calculations are routinely limited to material systems with at most a few thousands of electrons. In this talk, recent advances in the stateoftheart will be discussed, enabling fast and accurate largescale DFT calculations via the development of DFTFE, a massively parallel opensource finiteelement (FE) based DFT code on hybrid CPUGPU architectures. DFTFE employs adaptive FE discretization alongside novel HPC centric numerical strategies based on mixed precision arithmetic that significantly reduce the data movement costs and increase arithmetic intensity on evolving hybrid CPUGPU architectures. The talk will also highlight some of the ongoing work at IISc on further accelerating DFTFE calculations using projectoraugmented formalism and methods to incorporate noncollinear magnetism with spinorbit coupling effects in DFTFE calculations. Subsequently, ongoing work on matrixfree approaches to accelerate FE sparse matrixmultivector multiplications arising in iterative eigensolvers on multinode CPUGPU architectures will be discussed. Finally, the loop will be closed by discussing some of the recent developments in AI/ML frameworks that can help accelerate materials discovery. These recent advances have wideranging implications for tackling critical scientific and technological problems including, designing new catalytic materials for clean fuel production, better materials for energy storage, devising materials and mechanisms for carbondioxide sequestration, discovering novel qubit materials for quantum computers, to name a few. Author Bio: Phani Motamarri is an Assistant Professor at the Department of Computational and Data Sciences (CDS), IISc Bangalore, from Dec 2019. Prior to this, he was a research faculty member at the University of Michigan, Ann Arbor, USA, where he received his PhD from the Department of Mechanical Engineering, working at the intersection of computational materials physics and scientific computing. His PhD work received the Robert J Melosh Medal for the best PhD student paper in finiteelement methods awarded by the International Association of Computational Mechanics (IACM). His primary research interests include developing mathematical techniques and HPCcentric computational algorithms that can leverage extremescale architectures for quantum modelling of materials, and furthermore, harnessing these capabilities to address challenging material modeling problems in key scientific areas. His current recent interests also include exploring lightweight graph neural networkbased ML approaches for accelerated materials discovery. He is also one of the lead developers of DFTFE  an opensource code for massively parallel DFT calculations that got nominated as a finalist for the 2019 ACM Gordon Bell Prize, the prestigious prize in Scientific computing. He also received NSM exascale for R&D grant award. Phani Motamarri is also part of the international team that recently won the ACM Gordon Bell Prize in 2023, marking the first time a research group from India that has been a part of this prestigious accolade. 

November 24 at 16:00 
Kandappan Shiv Nadar University Chennai  Hierarchical lowrank structures on distributed systems Large dense matrices are frequently encountered in a wide range of scientific and engineering problems. For a dense matrix of size NxN, both space complexity and computational complexity to perform a matrixvector product scale quadratically with N. As a result, naive matrixvector products are impractical. However, specific classes of dense matrices that arise in Nbody problems, such as radial basis function interpolation, scattering problems, etc., possess a desirable property: their offdiagonal submatrices are numerically lowrank. This property helps reduce the space complexity (to store these matrices) and computational complexity to perform a matrixvector product. Such matrices are termed as Hierarchical Matrices or Hmatrices. In this talk, we will discuss a class of Hmatrix known as HODLR3D and how well it scales in distributed memory systems. Author Bio: Dr. V.A. Kandappan is an Assistant Professor in the Department of Computer Science at Shiv Nadar University Chennai. He has a wellrounded academic background, having completed his Bachelor's degree in Electrical and Electronics Engineering, a Master's degree in Power Systems Engineering from Anna University, and a PhD in Computational Science from IIT Madras. Before his PhD, he worked as a Graduate Engineer at IBM India Pvt. Ltd., specializing in Software Testing Automation. His current research focuses on developing fast matrix algorithms for scientific computing applications, with a particular interest in highperformance computing and machine learning. 

