Alum @ AlmaCSE IIT MadrasWe learn from our alumni in this interaction series, often technically, sometimes semi-technically.
Dr. Dileep A. D. is presently working as an Associate Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Dharwad (IIT Dharwad), Karnataka. He is also serving as head of the CSE department at IIT Dharwad. Prior to this he was working in the position of Assistant Professor and later as Associate Professor at School of Computing and Electrical Engineering, IIT Mandi during 2013 to 2023. Dr. Dileep has a PhD and MTech degrees awarded by the Department of Computer Science and Engineering at IIT Madras in the year 2013 and 2006 respectively. He completed his BE in Computer Science and Engineering from Rural Engineering College, Bhalki affiliated to Gulbarga University in the year 2000. During the year 2001 to 2004, he worked as lecturer in the Department of Information Science at NMAM Institute of Technology Nitte, Karnataka.
His research interests include applied machine learning and deep learning, speech technology, spoken language identification and diarization, computer vision, machine learning for telecom and cloud networks. Dr. Dileep has a vast experience of project management and completed funded projects worth of approximately 12 crores supported by various funding agencies like DRDO, Ministry of Environment, Department of Biotechnology (DBT), Ministry of Education (MoE), Hitachi Pvt Ltd etc. Presently he is handling a project titled Speech Technologies in Indian Languages - Spoken Language Recognition and Diarization worth of 1 crore sponsored by Ministry of Electronics and Information Technology (MeitY). This project is a part of the consortium project, Bhashini: Speech Technologies for Indian Languages led by IIT Madras (Prof. Hema Murthy and Prof Umesh). Dr. Dileep A D is also involved in research guidance where 3 PhD and 4 MS scholars have completed their thesis work under his guidance and many more are pursuing. Dr. Dileep A. D. has so far published more than 75 research articles in reputed journals, conferences, and edited volumes. As an accomplished teacher, Dr. Dileep A D is a recipient of best teaching awards (twice) by IIT Mandi. Dr. Dileep A D is well known among students for his commitment to teaching and his Unique teaching methodologies that helps the students to understand complex concepts.
Spoken Language Identification Systems in Real-World Conditions Spoken language identification (LID) is the process of automatically identifying the spoken language in a given speech sample. In spite of recent advancements in the area of machine learning and deep learning, performance of state-of-the-art LID systems are still not satisfactory in real-world conditions. The main reasons for such unsatisfactory performance are domain-mismatch, high interclass similarities and intraclass variations in the real-world test samples. In order to get a satisfactory performance in real-world conditions, the robustness of the LID system needs to be improved. Though robustness of a system can be greatly improved by training it on a large dataset with wide variety of background conditions, such training datasets are not available in low-resource conditions. Motivated by all these, our recent work focus on developing different techniques to improve the robustness of LID system to real-world conditions, under such low-resource conditions.
The talk first discusses a method to represent the speech sample using an utterance-level embedding called u-vector, which is derived using intermediate-level LID-specific features called LID-sequential-senones (LID-seq-senones). These LID-seq-senones are obtained using a bidirectional long short-term memory (BLSTM) based network, which is designed to process the input speech by dividing it into multiple fixed-length chunks. As each LID-seq-senone is designed to efficiently encode the LID-specific contents in the given chunk of speech, such representation improves the ability of the network to handle real-world challenges. The talk then discusses a bi-resolution processing based approach to further enhance the robustness of the u-vector. In this approach, unlike the state-of-the-art approaches, the LID network contains a set of two utterance-level embedding extractors at the front-end to analyze the input at two different temporal resolutions. Such arrangement allows the u-vector to gather some complementary information about the input from two different views, leading to reduced vulnerability of the network to real-world conditions. Following this, a within-sample similarity loss (WSSL) based approach will be discussed that further enhance the robustness of u-vector in the bi-resolution processing based approach. Specifically, by suppressing the similarities between the two branches of the bi-resolution processing based network, WSSL approach explicitly encourage the two branches to encode dissimilar LID-specific contents in the input and ignore certain domain-specific contents that are common between them.
Organizers |