Paper Abstracts

Balaji, L., and Ravindran, B. (2010) "Transfer Learning Across Heterogeneous Robots with Action Sequence Mapping". In the Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), pp. 3251-3256. IEEE Press.

ABSTRACT

Transfer learning refers to reusing the knowledge gained while solving a task, to solve a related task more efficiently. Much of the prior work on transfer learning, assumes that identical robots were involved in both the tasks. In this work we focus on transfer learning across heterogeneous robots while solving the same task. The action capabilities of the robots are different and are unknown to each other. The actions of one robot cannot be mimicked by another even if known. Such situations arise in multi-robot systems. The objective then is to speed-up the learning of one robot, i.e., reduce its initial exploration, using very minimal knowledge from a different robot. We propose a framework in which the knowledge transfer is effected through a pseudo reward function generated from the trajectories followed by a different robot while solving the same task. The framework can effectively be used even with a single trajectory. We extend the framework to enable the robot to learn an equivalence between certain sequences of its actions and certain sequences of actions of the other robot. These are then used to learn faster on subsequent tasks. We empirically validate the framework in a rooms world domain.

Kate, K. and Ravindran, B. (2009) "Epsilon Equitable Partition: A Positional Analysis Method for Large Social Networks". To appear in the Proceedings of 15th International Conference on Management of Data (COMAD 2009)

ABSTRACT

Positional analysis is considered an important tool in the analysis of social networks. It involves partitioning of the set of actors into subsets such that actors in a subset are similar in their structural relationships with other actors. Traditional methods of positional analysis such as structural equivalence, regular equivalence, and equitable partitions are either too strict or too generic. For real world large graphs, most of these methods result into almost trivial partitions. We propose a useful relaxation to the concept of equitable partition called an epsilon equitable partition. A variant of epsilon equitable partition called maximal epsilon equitable partition is also proposed and formulated as an optimization problem. A fast algorithm for computing epsilon equitable partitions is proposed. We also present the results of performing positional analysis on a number of networks and demonstrate empirically that positional analysis with our notion gives raise to non-trivial and meaningful partitions. Along with the static network analysis with respect to positions, we study the impact of positional analysis on the evolution of networks. Our results show that positional analysis indeed plays an important role in the evolution of networks.

Bahuguna, J., Ravindran, B., and Krishna, K. M. (2009) "MDP based Active Localization for Multiple Robots". In the Proceedings of the Fifth Annual IEEE Conference on Automation Science and Engineering (CASE 2009), pp. 635-640. IEEE Press.

ABSTRACT

In environments with identical features, the global localization of a robot, might result in multiple hypotheses of its location. If the situation is extrapolated to multiple robots, it results in multiple hypotheses for multiple robots. The localization is facilitated if the robots are actively guided towards locations where it can use other robots as well as learning technique for the above process of active localization of multiple robots by co-operation. An MDP framework is used for learning the task, over a semi-decentralized team of robots hereby maintaining a bounded complexity as opposed to various multi-agent learning techniques, which scale exponentially with the increase in the number of robots.

Swapna Raj, P. and Ravindran, B. (2008) "Personalized Web-page Rendering System". In the Proceedings of the Fourteenth International Conference on Management of Data (COMAD), pp. 30-39.

ABSTRACT

Personalized rendering of web pages gives the users greater control to view only what they prefer. The goal of this work is to provide a tool that will let users customize the content on the pages. Our proposed model architecture learns user preferences through interaction and eventually learns to block content that is not of interest, or is offensive, to the user. This learning from interaction is achieved through a combination of reinforcement learning and data mining techniques. In this paper we look at customizing the rendering of advertisements. We provide the user a tool that customizes itself to their preferences, and blocks irrelevant advertisements and allow only those that are of interest to the user. We also demonstrate empirically that our tool customizes itself to hypothetical hand crafted users.

Arora, R. and Ravindran, B. (2008) "Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization". In the Proceedings of the Eighth IEEE International Conference on Data Mining (ICDM 2008), pp. 713-718. IEEE Press.

ABSTRACT

Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can break down these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decomposition is used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

Arora, R. and Ravindran, B. (2008) "Latent Dirichlet Allocation Based Multi-Document Summarization". In the Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data (AND 2008), pp. 91-97. ACM Press.

ABSTRACT

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.

Cheboli, D. and Ravindran, B. (2008) "Detection of keratoconus by semi-supervised learning". In the ICML/UAI workshop on Machine Learning in health care applications.

ABSTRACT

Keratoconus, is a non-inflammatory disorder of the eye in which structural changes within the cornea cause it to thin and change to a more conical shape which leads to substantial distortion of vision. Current methods for the detection of the disease mainly are Supervised learning approaches. Our goal is to label suspect patients who have not been clinically diagnosed(unlabeled data). Hence, we propose a new model, which integrates Manifold learning techniques with Semi-Supervised learning, to maximize the utility of the unlabeled data.

Narayanamurthy, S. M. and Ravindran, B. (2008) "On the Hardness of Finding Symmetries in Markov Decision Processes". In the Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML 2008), pp. 688-696. AAAI Press.

ABSTRACT

In this work we address the question of finding symmetries of a given MDP. We show that the problem is Isomorphism Complete, that is, the problem is polynomially equivalent to verifying whether two graphs are isomorphic. Apart from the theoretical importance of this result it has an important practical application. The reduction presented can be used together with any off-the-shelf Graph Isomorphism solver, which performs well in the average case, to find symmetries of an MDP. In fact, we present results of using NAutY (the best Graph Isomorphism solver currently available), to find symmetries of MDPs.

Ray, A., Kumar, V., Ravindran, B., Dr. Lingam Gopal, and Dr. Verma, A. (2008) "Machine Learning to predict the incidence of Retinopathy of Prematurity". In the Proceedings of the Twenty First Florida AI Research Society Conference (FLAIRS 2008), pp. 300-305. AAAI Press.

ABSTRACT

Retinopathy of Prematurity (ROP) is a disorder afflicting prematurely born infants. ROP can be positively diagnosed a few weeks after birth. The goal of this study is to build an automatic tool for prediction of the incidence of ROP from standard clinical factors recorded at birth for premature babies. The data presents various challenges including mixing of categorical and numeric attributes and noisy data.In this article we present an ensemble classifier-hierarchical committee of random trees—that uses risk factors recorded at birth in order to predict the risk of developing ROP. We empirically demonstrate that our classifier outperforms other state of the art classification approaches.

Swaminathan, P. and Ravindran, B. (2008) "Co-SOFT-Clustering: An Information Theoretic approach to obtain overlapping clusters from co-occurrence data". In the Proceedings of the Twenty First Florida AI Research Society Conference (FLAIRS 2008), pp. 320-321. AAAI Press.

ABSTRACT

Co-clustering exploits co-occurrence information, from contingency tables to cluster both rows and columns simultaneously. It has been established that co-clustering produces a better clustering structure as compared to conventional methods of clustering. So far, co-clustering has only been used as a technique for producing hard clusters, which might be inadequate for applications such as document clustering. In this paper, we present an algorithm using the information theoretic approach [1] to generate overlapping (soft) clusters. The algorithm maintains probability membership for every instance to each of the possible clusters and iteratively tunes these membership values. The theoretical formulation of the criterion function is presented first, followed by the actual algorithm. We evaluate the algorithm over document/word co-occurrence information and present experimental results.

Jayarajan, D., Deodhare, D., and Ravindran, B. (2008) "Lexical Chains as Document Features". In the Proceedings of the Third International Joint Confernce on Natural Language Processing, IJCNLP 2008, pp. 111-117

ABSTRACT

Document clustering and classification is usually done by representing the documents using a bag of words scheme. This scheme ignores many of the linguistic and semantic features contained in text documents. We propose here an alternative representation for documents using Lexical Chains. We compare the performance of the new representation against the old one on a clustering task. We show that Lexical Chain based features give better results than the Bag of Words based features, while achieving almost 30% reduction in the dimensionality of the feature vectors resulting in faster execution of the algorithms.

Saravanan, M., Ravindran, B., and Raman, S. (2008) "Automatic Identification of Rhetorical Roles using Conditional Random Fields for Legal Document Summarization ". In the Proceedings of the Third International Joint Confernce on Natural Language Processing, IJCNLP 2008, pp. 481-488.

ABSTRACT

In this paper, we propose a machine learning approach to rhetorical role identification from legal documents. In our approach, we annotate roles in sample documents with the help of legal experts and take them as training data. Conditional random field model has been trained with the data to perform rhetorical role identification with reinforcement of rich feature sets. The understanding of structure of a legal document and the application of mathematical model can brings out an effective summary in the final stage. Other important new findings in this work include that the training of a model for one sub-domain can be extended to another sub-domains with very limited augmentation of feature sets. Moreover, we can significantly improve extraction-based summarization results by modifying the ranking of sentences with the importance of specific roles.

Saravanan, M., Ravindran, B., and Raman, S. (2007) "Legal Ontology for Query Enhancement". In the Proceedings of the Twentieth Annual Conference on Legal Knowledge and Information Systems(JURIX 2007), pp.171-172. IOS Press.

ABSTRACT

In this paper, we have proposed a novel comprehensive structural framework for the construction of ontology from a given legal corpus for the purpose of query enhancement to pick relevant documents for generating a summary. We evaluated our system with queries given by legal experts and non-experts and found that ontology-based query enhancement system results are significantly better than standard Microsoft Windows search queries.

Sarma, B.H.S. and Ravindran, B. (2007) "Intelligent Tutoring System using Reinforcement Learning to Teach Autistic Students". In the Proceedings of the Conference on Home/Community Oriented ICT for the Next Billion (HOIT 2007), pp. 65-78. Springer.

ABSTRACT

Many Intelligent Tutoring Systems have been developed using different Artificial Intelligence techniques. In this paper we propose to use Reinforcement Learning for building an intelligent tutoring system to teach autistic students, who can't communicate well with others. In reinforcement learning, a policy is updated for taking appropriate action to teach the student. The main advantage of using reinforcement learning is that, it eliminates the need for encoding pedagogical rules. Various issues in using reinforcement learning for intelligent tutoring systems are discussed in this paper.

Sriram, R. and Ravindran, B. (2007) "Homogeneous Hierarchical Composition of Areas in Multi-Robot Area Coverage". In the Proceedings of the Seventh Symposium on Abstraction, Approximation, and Reformulation (SARA 2007), pp. 300-313, LNAI 4612. Springer.

ABSTRACT

Multi-robot area coverage poses several research challenges. The challenge of coordinating multiple robots’ actions coupled with the challenge of minimizing the overlap in coverage across robots becomes even more complex and critical when large teams and large areas are involved. In fact, the effi- ciency critically hinges on the coordination algorithms used and the robot capabilities. Multi-robot coverage of such large areas can be tackled by the divide-and-conquer policy; decomposing the coverage area into several small coverage grids. It is fairly simple to devise algorithms to minimize the overlap in small grids by making simple assumptions. If the overlap ratio of these small grids can be controlled, one may be able to integrate them appropriately to cover the large grid. In this paper, we introduce homogeneous hierarchical composition grids to decompose a coverage area into several small coverage primitives with appropriately sized robot teams. These coverage grids are viewed as cells at a Meta level and composed hierarchically with such teams functioning as a single unit. We state and prove an associated theorem that provides very good scaling properties to large grids. We have performed simulated studies to validate the claims and study performance.

Awasthi, P., Gagrani, A., and Ravindran, B. (2007) "Image Modeling using Tree Structured Conditional Random Fields". In the Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2060-2065. AAAI Press.

ABSTRACT

In this paper we present a discriminative framework based on conditional random fields for stochastic modeling of images in a hierarchical fashion. The main advantage of the proposed framework is its ability to incorporate a rich set of interactions among the image sites. We achieve this by inducing a hierarchy of hidden variables over the given label field. The proposed tree like structure of our model eliminates the need for a huge parameter space and at the same time permits the use of exact and efficient inference procedures based on belief propagation. We demonstrate the generality of our approach by applying it to two important computer vision tasks, namely image labeling and object detection. The model parameters are trained using the contrastive divergence algorithm. We report the performance on real world images and compare it with the existing approaches.

Narayanamurthy, S. M. and Ravindran, B. (2007) "Efficiently Exploiting Symmetries in Real Time Dynamic Programming". In the Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 2556-2561. AAAI Press.

ABSTRACT

Current approaches to solving Markov Decision Processes (MDPs) are sensitive to the size of the MDP. When applied to real world problems though, MDPs exhibit considerable implicit redundancy especially in the form of symmetries. Existing model minimization methods do not exploit the redundancy due to symmetries well. In this work, given such symmetries, we present a time-efficient algorithm to construct a functionally equivalent reduced model of the MDP. Further we present a Real Time Dynamic Programming (RTDP) algorithm which obviates an explicit construction of the reduced model by integrating the given symmetries into it. The RTDP algorithm solves the reduced model, while working with parameters of the original model and the given symmetries. As RTDP uses its experience to determine which states to backup, it focuses on parts of the reduced state set that are most relevant. This results in significantly faster learning and a reduced overall execution time. The algorithms proposed are particularly effective in the case of structured automorphisms even when the reduced model does not have fewer features. We demonstrate the results empirically on several domains.

Ravindran, B., Barto, A. G., and Mathew, V. (2007) "Deictic Option Schemas". In the Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1023-1028. AAAI Press.

ABSTRACT

Deictic representation is a representational paradigm, based on selective attention and pointers, that allows an agent to learn and reason about rich complex environments. In this article we present a hierarchical reinforcement learning framework that employs aspects of deictic representation. We also present a Bayesian algorithm for learning the correct representation for a given sub-problem and empirically validate it on a complex game environment.

Jayarajan, D., Deodhare, D., Ravindran, B., and Sarkar, S. (2007) "Document Clustering using Lexical Chains". In the Proceedings of the Workshop on Text-Mining & Link-Analysis (TextLink 2007).

ABSTRACT

Document Clustering is an unsupervised categorisation of documents based on the contents of the documents. Numerous algorithms have been proposed for this and invariably, all of these algorithms use some veriation of the vector space model to represent the documents on which the clustering algorithm is run. Lexical chains are groups of words which wxhibit a cohesion among them.It is based on the idea that semantically related wods co-occur more than "just by chance". By identifying and using lexical chains, it would be possible to represent the documents in terms of its concept. We propose here the use of lexical chains as an alternative representation for the documents and further leverage them to cluster the documents.

Sriram, R. and Ravindran, B. (2007) "Profiling Pseudonet Architecture for Coordinating Mobile Robots". In the Proceedings of the Second IEEE International Conference on COMmunication System softWAre and MiddlewaRE (COMSWARE 2007). IEEE Press.

ABSTRACT

Area coverage and Navigation are two fundamental requirements for robot applications. When multiple robots are fielded in a scene, coordination through communication becomes a natural prerequisite. This paper focuses on the area coverage problem and proposes periodic exchange of state information related to location and coverage as a solution. The solution is based on Pseudonet communication architecture that enables exchange of messages between the robots by setting up a Bluetooth piconet or scatternet and maintaining the same throughout the process of covering the area.

Saravanan, M., Ravindran, B., and Raman, S. (2006) " Improving Legal Document Summarization using Graphical Models " In the Proceedings of the Nineteenth Annual Conference on Legal Knowledge and Information Systems(JURIX 2006), pp. 51-60, IOS Press.

ABSTRACT

In this paper, we propose a novel idea for applying probabilistic graphical models for automatic text summarization task related to a legal domain. Identification of rhetorical roles present in the sentences of a legal document is the important text mining process involved in this task. A Conditional Random Field (CRF) is applied to segment a given legal document into seven labeled components and each label represents the appropriate rhetorical roles. Feature sets with varying characteristics are employed in order to provide significant improvements in CRFs performance. Our system is then enriched by the application of a term distribution model with structured domain knowledge to extract key sentences related to rhetorical categories. The final structured summary has been observed to be closest to 80% accuracy level to the ideal summary generated by experts in the area.

Saravanan, M., Raman, S., and Ravindran, B. (2006) "A Probabilistic Approach to Multi-Document Summarization for Generating a Tiled Summary". In International Journal of Computational Intelligence and Applications, Vol. 6, No. 2, June 2006, pp. 231-244. Imperial College Press. (This is an expanded version of the conference paper below.)

ABSTRACT

Data availability is not a major issue at present times in view of the widespread use of Internet; however, information and knowledge availability are the issues. Due to data overload and time-critical nature of information need, automatic summarization of documents plays a significant role in information retrieval and text data mining. This paper discusses the design of a multi-document summarizer that uses Katz’s K-mixture model for term distribution. The model helps in ranking the sentences by a modified term weight assignment. Highly ranked sentences are selected for the final summary. The sentences that are repetitive in nature are eliminated, and a tiled summary is produced. Our method avoids redundancy and produces a readable (even browsable) summary, which we refer to as an event-specific tiled summary. The system has been evaluated against the frequently occurring sentences in the summaries generated by a set of human subjects. Our system outperforms other auto-summarizers at different extraction levels of summarization with respect to the ideal summary, and is close to the ideal summary at 40% extraction level.

Siva Soumya, E., Manimegalai, R., Muralidharan V., Ravindran, B., Kamaoti, V., and Bhatia, D. (2005) "Placement and Routing for 3D-FPGAs using Reinforcement Learning and Support Vector Machines". In the Proceedings of the Eighteenth International Conference on VLSI Design.

ABSTRACT

The primary advantage of using 3D-FPGA over 2D-FPGA is that the vertical stacking of active layers reduce the Manhattan distance between the components in 3D-FPGA than when placed on 2D-FPGA. This results in a considerable reduction in total interconnect length. Reduced wire length eventually leads to reduction in delay and hence improved performance and speed. Design of an efficient placement and routing algorithm for 3D-FPGA that fully exploits the above mentioned advantage is a problem of deep research and commercial interest. In this paper, an efficient placement and routing algorithm is proposed for 3D-FPGAs which yields better results in terms of total interconnect length and channel-width. The proposed algorithm employs two important techniques, namely, Reinforcement Learning (RL) and Support Vector Machines (SVMs), to perform the placement. The proposed algorithm is implemented and tested on standard benchmark circuits and the results obtained are encouraging. This is one of the very few instances where reinforcement learning is used for solving a problem in the area of VLSI.

Ravindran, B. and Barto, A. G. (2004) "Approximate Homomorphisms: A Framework for Non-exact Minimization in Markov Decision Processes". In the Proceedings of the Fifth International Conference on Knowledge Based Computer Systems (KBCS 04).

ABSTRACT

To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. In earlier work we explored in detail what constitutes a useful abstraction in a stochastic sequential decision problem modeled as a Markov Decision Process (MDP). We based our approach on the notion of an MDP homomorphism. In this article we look at relaxing the strict conditions imposed earlier and introduce approximate homomorphisms that allow us to construct useful abstract models even when the homomorphism conditions are not met exactly. We also present a result on bounding the loss resulting from this approximation.

Ravindran, B. (2004) "An Algebraic Approach to Abstraction in Reinforcement Learning". Doctoral Dissertation, Department of Computer Science, University of Massachusetts, Amherst MA.

ABSTRACT

To operate effectively in complex environments learning agents require the ability to form useful abstractions, that is, the ability to selectively ignore irrelevant details. Stated in general terms this is a very difficult problem. Much of the work in this field is specialized to specific modeling paradigms or classes of problems. In this thesis we introduce an abstraction framework for Markov decision processes (MDPs) based on homomorphisms relating MDPs. We build on classical finite-state automata literature and develop a minimization framework for MDPs that can exploit structure and symmetries to derive smaller equivalent models of the problem. Since employing homomorphisms for minimization requires that the resulting abstractions be exact, we introduce approximate and partial homomorphisms and develop bounds for the loss that results from employing relaxed abstraction criteria.

Our MDP minimization results can be readily employed by reinforcement learning (RL) methods for forming abstractions. We extend our abstraction approach to hierarchical RL, specifically using the options framework. We introduce relativized options, a generalization of Markov sub-goal options, that allow us to define options without an absolute frame of reference. We introduce an extension to the options framework, based on relativized options, that allows us to learn simultaneously at multiple levels of the hierarchy and also employ hierarchy-specific abstractions. We provide certain theoretical guarantees regarding the performance of hierarchical systems that employ approximate abstraction. We empirically demonstrate the utility of relativized options in several test-beds.

Relativized options can also be interpreted as behavioral schemas. We demonstrate that such schemas can be profitably employed in a hierarchical RL setting. We also develop algorithms that learn the appropriate parameter binding to a given schema. We empirically demonstrate the validity and utility of these algorithms. Relativized options allow us to model certain aspects of deictic or indexical representations. We develop a modification of our parameter binding algorithm suited to hierarchical RL architectures that employ deictic representations.

Ravindran, B. and Barto, A. G. (2003) " Relativized Options: Choosing the Right Transformation". In the Proceedings of the Twentieth International Conference on Machine Learning(ICML 2003), pp. 608-615. AAAI Press.

ABSTRACT

Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option’s policy is transformed suitably based on the circumstances under which the option is invoked. In earlier work we addressed the issue of learning the option policy online. In this article we develop an algorithm for choosing, from among a set of candidate transformations, the right transformation for each member of the family of tasks.

Ravindran, B. and Barto, A. G. (2003) " SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi Markov Decision Processes". In the Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 03), pp. 1011-1016. AAAI Press.

ABSTRACT

To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. In this article we consider the question of what constitutes a useful abstraction in a stochastic sequential decision problem modeled as a semi-Markov Decision Process (SMDPs). We introduce the notion of SMDP homomorphism and argue that it provides a useful tool for a rigorous study of abstraction for SMDPs. We present an SMDP minimization framework and an abstraction framework for factored MDPs based on SMDP homomorphisms. We also model different classes of abstractions that arise in hierarchical systems. Although we use the options framework for purposes of illustration, the ideas are more generally applicable. We also show that the conditions for abstraction we employ are a generalization of earlier work by Dietterich as applied to the options framework.

Ravindran, B. and Barto, A. G. (2003) " An Algebraic Approach to Abstraction in Reinforcement Learning". In the Proceedings of the Twelfth Yale Workshop on Adaptive and Learning Systems, pp. 109-114. Yale University.

ABSTRACT

To operate effectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this article we outline a formulation of abstraction for reinforcement learning approaches to stochastic sequential decision problems modeled as semiMarkov Decision Processes (SMDPs). Building on existing algebraic approaches, we propose the concept of SMDP homomorphism and argue that it provides a useful tool for a rigorous study of abstraction for SMDPs. We apply this framework to different classes of abstractions that arise in hierarchical systems and discuss relativized options, a framework for compactly specifying a related family of temporally-extended actions.

Ravindran, B. and Barto, A. G. (2002) " Model Minimization in Hierarchical Reinforcement Learning". In the Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (SARA 2002), pp.196-211, LNCS, Springer Verlag. (Slides from presentation: PPT)

ABSTRACT

When applied to real world problems Markov Decision Processes(MDPs) often exhibit considerable implicit redundancy, especially when there are symmetries in the problem. In this article we present an MDP minimization framework based on homomorphims. The framework exploits redundacy and symmetry to derive smaller equivalent models of the problem. We then apply our minimization ideas to the options framework to derive relativized options- options defined without an absolute frame of reference. We demonstrate their utility empirically even in cases where the minimization criteria are not met exactly.

Ravindran, B. and Barto, A. G. (2001) " Symmetries and Model Minimization of Markov Decision Processes". Computer Science Technical Report 01-43, University of Massachusetts, Amherst, MA.)

ABSTRACT

Current solution and modelling approaches to Markov Decision Processes (MDPs) scale poorly with the size of the MDP. Model minimization methods address this issue by exploiting redundancy in problem specification to reduce the size of the MDP model. Symmetries in a problem specification can give rise to special forms of redundancy that are not exploited by existing minimization methods. In this work we extend Dean and Givan's model minimization framework to include symmetries. We base our framework on concepts derived from finite state automata and group theory.

Sutton, R. S., Singh, S., Precup, D. and Ravindran, B. (1999) " Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems 11 (Proceedings of NIPS'98), pp.1066-1072. MIT Press.

ABSTRACT

In robotics and other control applications it is commonplace to have a preexisting set of controllers for solving subtasks, perhaps hand-crafted or previously learned or planned, and still face a difficult problem of how to choose and switch among the controllers to solve an overall task as well as possible. In this paper we present a framework based on Markov decision processes and semi-Markov decision processes for phrasing this problem, a basic theorem regarding the improvement in performance that can be obtained by switching flexibly between given controllers, and example applications of the theorem. In particular, we show how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning. In one of our examples, the complexity of the problem is reduced from 24 billion state-action pairs to less than a million state-controller pairs.

McGovern, Amy , Precup, Doina, Ravindran, B., Singh, Satinder and Sutton, Richard S. (1998) " Hierarchical Optimal Control of MDPs", Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, pp.186-191

ABSTRACT

Fundamental to reinforcement learning, as well as to the theory of systems and control, is the problem of representing knowledge about the environment and about possible courses of action hierarchically, at a multiplicity of interrelated temporal scales. For example, a human traveler must decide which cities to go to, whether to fly, drive, or walk, and the individual muscle contractions involved in each step. In this paper we survey a new approach to reinforcement learning in which each of these decisions is treated uniformly. Each low-level action and high-level course of action is represented as an option, a (sub)controller and a termination condition. The theory of options is based on the theories of Markov and semi-Markov decision processes, but extends these in significant ways. Options can be used in place of actions in all the planning and learning methods conventionally used in reinforcement learning. Options and models of options can be learned for a wide variety of different subtasks, and then rapidly combined to solve new tasks. Options enable planning and learning simultaneously at a wide variety of times scales, and toward a wide variety of subtasks, substantially increasing the efficiency and abilities of reinforcement learning systems.

Ravindran, B. (1996) "Solution of Delayed Reinforcement Learning Problems having Continuous Action Spaces", Master's Thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India

ABSTRACT

This work concerns the solutions of delayed Reinforcement Learning problem having continous action spaces. The problems associated with continous action spaces are discussed and various existing algorithms for solving the problem are presented. A extension of Q-learning for solving delayed RL problems having continous action spaces is proposed which overcomes drawbacks associated with existing methods. Simulation results are presented to demonstrate the working of the proposed algorithm.

Keerthi, S. S. and Ravindran, B. (1996) " C3: Reinforcement Learning". In Handbook Of Neural Computation, E. Fiesler and R. Beale, Editors, Oxford University Press, U. K.

ABSTRACT

This chapter gives a compact, self-contained tutorial survey of reinforcement learning, a tool that is increasingly finding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This chapter surveys the literature and presents and presents the algorithm in a cohesive framework.

Keerthi, S. S. and Ravindran, B. (1994) " A Tutorial Survey Of Reinforcement Learning". In Sadhana (Proceedings of the Indian Academy of Sciences), Vol. 19, Dec. 1994, pp. 851-889.

ABSTRACT