Skip to content. | Skip to navigation

Personal tools
Intrinsically Motivated
Cumulative Learning
Versatile Robots
Document Actions

Introductory lectures

Synopses for the spring school lectures

Intrinsic Motivation and Robots: Some Examples

Andrew Barto, University of Massachusetts Amherst

I describe some examples of how ideas about intrinsic motivation have influenced robotics research in the Autonomous Learning Laboratory and the Laboratory for Perceptual Robotics at UMass Amherst.  The control basis approach developed in the Laboratory for Perceptual Robotics provides a framework allowing learning to take place at an abstract level, while lower-level controllers enforce desired constraints.  A form of intrinsic motivation facilitates the formation of new high-level control programs.  This approach is illustrated using a bimanual torso. Another series of experiments with a mobile manipulation robot demonstrates how the motive of being able to trigger a previously learned skill can lead to efficient learning of relatively complex tasks. While these illustrations do not fully exercise the concept of intrinsic motivation, they provide suggestive examples of how intrinsic motivation can facilitate the acquisition of control skills.

Presentation 

 

Modeling the development: the key role of emotions and resonance mechanisms for bootstrapping complex sensori-motor learning

Philippe Gaussier, Cergy-Pontoise University

What are the minimal properties sufficient to bootstrap the development of more and more complex cognitive functions? To address this problem, we first started to study how a neurobiologically plausible neural network could control an autonomous robot. Sensory-motor coordinations were learned according to a reinforcement signal or even by latent learning and this learning provides the basis to build a cognitive map to plan actions according to some basic internal drives [1,4]. Yet, it appeared quickly that the performance was directly linked to the reinforcement and the drive functions thus lowering the interest of the obtained result (the learning being not really autonomous). To learn complex tasks, imitation appeared as an appealing solution. To overcome direct teaching or the introduction of an ad hoc imitation module, we showed first that low level imitations do not need an a priori mechanism devoted to social interactions but could emerge from the coupling of a homeostatic system and the perception ambiguity [2,3,6]. Yet, again, the performances of this system were highly dependent of the kind of interactions the experimenter had with the robot: deciding when to start and stop an experiment (and the learning) can be more complex than learning the task itself because the experimenter defines implicitly a context and allows the robot to learn only the relevant features. In the same vein, our early works showed imitating a human produced better results than imitating a robot since the human adapts its behavior to the robot showing by its speed variations the important parts of the behavior to be learned [2,7]. This echoes the researches in psychology showing the important role of imitation as a communication function in the first years of life and the importance of turn taking and role switching in the interactions and the learning process [5]. The problem becomes how to introduce some second order controllers allowing modulating the behaviors and their learning according to internal measures such as the novelty in the sensory signal or the efficiency of the behavior. Hence, these problems lead to try answering fundamental questions about understanding the development of basic emotions [9] such as surprise, frustration, hanger, and happiness…

Studying how babies learn to recognize the facial expressions of their parents without any a priori meaning of their expression has enlighten again that a simple sensory-motor learning could be sufficient if the parents (having the role of teachers) imitate (or resonate) to the facial expressions of their baby [8]. Yet, at the opposite of the previous case, imitation is not used for learning but for teaching. Moreover, the system does not need any a priori face detection mechanism. In a counter intuitive way, we were able to show that the ability to detect a face can be learned autonomously from the expression recognition system while trying to learn first to detect and localize a face would need an explicit supervision.

Now, from this starting point, the robot should easily develop in an autonomous way a social referencing capability [10] allowing for instance to prefer trying to grasp a tool seen as positive by the teacher and then allowing the latent learning of some new affordances that will be latter essential for learning complex manipulations involving the tool. Without such emerging social referencing skill, it would be necessary to use an explicitly supervised shaping technique needing rewards that change according to the training stages to drive the learning in the correct direction. In conclusion, our previous works allow us to formulate new hypotheses on the role of mechanisms such as the synchrony or the rhythm [3] that we initially did not think as "central" in our models. It also brings us to study more and more the role of emotions in the development of cognitive functions. We believe focusing on mechanisms having a double function (or more) is a very promising field for the understanding of the developmental processes and avoid the drawback of a functional approach.

References:
[1]  P. Gaussier, A. Revel, C. Joulain, S. Zrehen Living in a partially structured environment : How to bypass the limitations of classical conditionning, Robotics and Autonomous System Journal, Vol 20, p 225-250, 1997
[2] P. Gaussier, S. Moga, J.P. Banquet et M. Quoy, From Perception-Action loops to imitation processes, Applied Artificial Intelligence, vol 1, number 7, p 701-727, 1998
[3] P. Andry, P. Gaussier, S. Moga, J.P. Banquet et J. Nadel, The dynamics of imitation processes: from temporal sequence learning to implicit reward communication, IEEE Trans. on Man, Systems and Cybernetics Part A: Systems and humans, vol 31, number 5, p 431-444, 2001
[4]  P. Gaussier, A. Revel, J.P. Banquet and V. Babeau, From view cells and place cells to cognitive map learning: processing stages of the hippocampal system", Biological Cybernetics, vol 86, p 15-28, 2002
[5] J. Nadel, A. Revel,P. Andry, P. Gaussier, Toward communication : first imitations in infants, children with autism and robots. Interdisciplinary Journal of Interaction Studies, vol 5, No 1, p 45-74, 2004
[6] P. Andry, P. Gaussier, J. Nadel, B. Hirsbrunner, Learning invariant sensori-motor behaviors: A developmental approach of imitation mechanisms. Adaptive behavior.  12(2), 2004
[7] C. Giovannangeli, Ph. Gaussier, Interactive teaching for vision-based mobile robot : a sensory-motor approach, IEEE Transactions on Man, Systems and Cybernetics, Part A: Systems and humans, 2010
[8] S. Boucenna, P. Gaussier, P. Andry, L. Hafemeister, Imitation as a Communication Tool for Online Facial Expression Learning and Recognition, IROS 2010
[9] C. Hasson, P. Gaussier, Frustration as a Generical Regulatory Mechanism for Motivated Navigation
in Proceeding of IROS 2010 - Taïwan, (2010)
[10] S. Boucenna, P. Gaussier, Laurence Hafemeister, K. Bard, Autonomous Development of Social Referencing Skills. SAB 2010, p 628-638

 

Did I do that, or did you ?  How biological systems find out

Peter Redgrave, The University of Sheffield

An influential concept in contemporary computational neuroscience is the reward prediction error hypothesis of phasic dopaminergic function.   It maintains that midbrain dopaminergic neurones signal the occurrence of unpredicted reward, which is used in appetitive learning to reinforce existing actions that most often lead to reward. However, the availability of limited afferent sensory processing and the precise timing of dopaminergic signals suggest they may instead play a central role in identifying those external events for which the agent is responsible (agency), and through trial and error, the discovery of exactly what component of behavioural output is causal (i.e. the development of novel actions).

Presentation

 

Spatial competition supports the development of visual selective attention in human infants:  A neurocomputational account

Matthew Schlesinger, Southern Illinois University

The fovea provides an effective solution to the visual-information bottleneck:  by limiting the amount of information that reaches the visual cortex, a large, complex scene can be divided into smaller, manageable chunks.  But this solution also creates a challenge for the human infant.  In particular, how do infants develop the capacity for visual selective attention, that is, the ability to deploy attention in a way that optimizes information pick-up?  I examine this question by describing a neurocomputational model of visual processing that exploits several properties of the mammalian visual system.  A key component of the model simulates the function of horizontal connections (between columns of neurons) in visual cortex, which enables multiple regions in the visual scene to simultaneously compete for attention.  The model not only succeeds in simulating infants' gaze patterns during an object-perception task, but also suggests how spatial competition supports the development of visual selective attention. 

Presentation

 

Early auditory processing and spikes: lessons for Neuromorphic auditory systems?

Leslie Smith, University of Stirling

First I will review animal early auditory processing: the auditory brainstem, both from an anatomical perspective, and from a functional viewpoint. I will try to relate these to the auditory what and where tasks. From this I will discuss what the issues arising from these are in both real and synthetic early auditory processing, and how they might be resolved. I will suggest ways in which these might be applied in a neuromorphic auditory system.

Presentation


Infant look, infant learn!

Jochen Triesch, Frankfurt Institute for Advanced Studies

Much of what we know about infants' cognitive abilities we have to infer from their looking behavior, because other aspects of their motor repertoire are slow to develop. Several experimental paradigms such as the classic habituation paradigm have been developed to study infant cognition based on where they look and how long they choose to look there. But interpreting the results seems difficult without a proper understanding of the intrinsic or extrinsic motivations that drive infants' looking. In a first part of the lecture, we will explain how a well-known phenomenon, the so-called familiarity-to-novelty shift in infant habitation, can be explained by a simple model based on an infant's intrinsic motivation to maximize its learning progress. In a second part, we will discuss a recently developed paradigm utilizing eye tracking technology to give infants direct control over their physical environment. It is shown that infants can quickly discover this novel way of acting on their environment with their eyes and that they rapidly learn to anticipate the consequences of these actions.

 

Back to main page