Title | : | Novel Approaches for Heterogeneity Aware Topology Design in Decentralized Federated Learning |
Speaker | : | Kondapalli Jayavardhan (IITM) |
Details | : | Tue, 23 Apr, 2024 3:30 PM @ SSB-233 |
Abstract: | : | In federated learning, models are trained across nodes while preserving data privacy. Centralized federated learning operates with a central server coordinating model updates across nodes, while decentralized federated learning operates in a peer-to-peer manner and models are trained by distributed stochastic gradient descent (DSGD). It is known that the success of DSGD depends critically on the underlying topology specified by a mixing matrix. Another factor that affects DSGS is data heterogeneity, a term used to denote the fact that the local data distribution of the individual nodes is different from the overall population distribution. Designing the right mixing matrix which balances such data heterogeneity is key for the success of DSGD. Recently an algorithm called Sparse Topology Learning - Frank Wolfe (STL-FW) was proposed to tackle data heterogeneity. At the start of training, STL-FW computes a sparse mixing matrix by optimizing an objective which combines the spectral gap and data heterogeneity. In this talk, we address two important shortcomings of STL-FW. Firstly, the topology returned by STL-FW may not be realizable in practice due to physical constraints. We propose a novel variant called Constrained Topology Learning - Frank Wolfe (CTL-FW) to compute an optimal mixing matrix which respects the underlying topological constraints. Secondly, during training, the local models exhibit heterogeneity. However, the mixing matrix computed by STL-FW at the start may not be sufficient to mitigate model heterogeneity occurring during training. We propose a novel variant called Dynamic Constrained Topology Learning - Frank Wolfe (DCTL-FW) which uses the test accuracies of the local models to compute a new mixing matrix every round. We show that DSGD performs better when it uses mixing matrices computed by CTL-FW and DCTL-FW in comparison to random mixing matrices (of the same topology) via experiments on synthetic and benchmark datasets. |