Structured models for information extraction on Dec 7, 2009 @ MLSS, CSE IIT Madras

Structured models for information extraction

Prof. Sunita Sarawagi (Dept. of CSE, IIT Bombay)
Dec 7, 2009 @ 3:00 pm
BSB 361, Dept. of CSE, IIT Madras

Abstract

Feature-based structured models provide a flexible and elegant framework for various information extraction (IE) tasks. These include label sequences for traditional IE, segmentation models for entity-level extractions, and skip chain models for collective labeling. I will present efficient inference algorithms for finding the highest scoring (MAP) prediction for two interesting types of structured models in IE.

There are two popular formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling is extremely popular since it requires the same kind of MAP inference as prediction, slack scaling is believed to be more accurate and better-behaved. I will describe an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling. Further I argue that existing scaling approaches do not separate the true labeling comprehensively while generating violating constraints. I will propose a new max-margin trainer PosLearn that generates violators to ensure separation at each position of a decomposable loss function.

Bio
Sunita Sarawagi researches in the fields of databases, data mining, and machine learning. Her current research interests are information integration, graphical and structured models, and probabilistic databases. She is associate professor at IIT Bombay. Prior to that she was a research staff member at IBM Almaden Research Center. She got her PhD in databases from the University of California at Berkeley and a bachelors degree from IIT Kharagpur. She has several publications in databases and data mining and several patents. She serves on the board of directors of ACM SIGKDD and VLDB foundation. She was program chair for the ACM SIGKDD 2008 conference and has served as program committee member for SIGMOD, VLDB, SIGKDD, ICDE, and ICML conferences. She is on the editorial board of the ACM TODS, ACM TKDD, and FnT for machine learning journals.