Skip to content. | Skip to navigation

Personal tools
Intrinsically Motivated
Cumulative Learning
Versatile Robots
Document Actions

Series of International Seminars on `Intrinsic motivations and cumulative learning' at ISTC-CNR

— filed under: ,
When ott 03, 2012 02:30 to
dic 19, 2012 04:30
Where Piaget Room, ISTC-CNR, Rome, Italy
Contact Name
Add event to calendar vCal
iCal

Introduction


This is a series of 13 seminars organised at the CNR-ISTC by the LOCEN Reseach Group together with some other research groups of the EU Project IM-CLeVeR (www.im-clever.eu; coordinated by ISTC-CNR) in Autum 2012. The seminars will focus on intrinsic motivations and cumulative learning in animals, humans and robots.

All seminars will be held at ISTC-CNR, Via San Martino della Battaglia 44, Piaget room. Most of the seminars will be held at 14.30-15.30, one per week (usually, but not always, on Wednesday or Thursday). All seminars will be held in English to allow the participation of non-Italian speakers/participants.

Below you can find the calendar of the seminars with the names of speakers, the abstracts of the presentations, a key reference on the presentation. 

 

*******************************************************************************************

October 3 (Wednesday, 14:30-15:30)

Andrew Barto

Department of Computer Science, University of Massachusetts - Amherst

Title: Intrinsic motivation from an evolutionary perspective


Abstract: There is great interest in building intrinsic motivation into artificial systems using the reinforcement learning framework. Yet, what intrinsic reward is computationally, and how it differs from extrinsic reward, remains a controversial subject. In this talk, I describe some computational experiments that elucidate aspects of the relationship between an agent's ultimate goals (e.g., reproductive success for an animal) and the primary rewards that influence its motivation and learning. Adopting an evolutionary perspective, we search in spaces of primary reward functions for functions that lead-through their influence on agents' motivation and learning-to analogs of high evolutionary success. Results shed light on the emergence of intrinsic and extrinsic reward signals. This work was carried out in collaboration with Satinder Singh, Rick Lewis, Jonathan Sorg, and Scott Niekum.


Reference:

Singh, S.; Lewis, R.; Barto, A. & Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2 (2), 70-82. IEEE.



*******************************************************************************************

October 11 (Thursday, , 14:30-15:30)

Gianluca Baldassarre

CNR-ISTC-LOCEN

Title: What are intrinsic motivations? A biological and computational perspective


Abstract

The concept of “intrinsic motivation”, initially proposed and developed within psychology, is gaining an increasing attention within cognitive sciences for its potential to produce open-ended learning machines and robots. However, a clear definition of the phenomenon is not yet available. This presentation aims to clarify what intrinsic motivations are from a biological perspective and from a computational perspective. To this purpose, it first shows how intrinsic motivations can be defined contrasting them to extrinsic motivations from an evolutionary (and engineering) perspective: whereas extrinsic motivations guide learning of behaviours that directly increase fitness (or satisfy the user/designer purposes), intrinsic motivations drive the acquisition of knowledge and skills that contribute to produce behaviours that increase fitness (or user satisfaction) only in a later stage. Given this key difference, extrinsic motivations generate learning signals on the basis of events involving body homeostatic regulations (accomplishment of user purposes), whereas intrinsic motivations generate transient learning signals mainly based on events taking place within the brain itself (or within the controller of the robot/intelligent machine). These ideas are supported by presenting (preliminary) taxonomies and examples of biological mechanisms underlying the two types of motivations, and also by linking them to some of the most commonly used mechanisms proposed by the literature to implement intrinsic motivations in robots and machines.


Reference

Baldassarre G. What are intrinsic motivations? A biological perspective. InProceedings of Conference on Developmental Learning and Epigenetic Robotics (ICDL-EPIROB), Frankfurt am Main, Germany, 24-27 August 2011, pp. E1 - 8.

 

 

*******************************************************************************************

October 17 (Wednesday, 14:30-15:30)

Vieri G. Santucci

CNR-ISTC-LOCEN

Title: Dopamine reconciled


Abstract

An important issue of recent neuroscientific research is to understand the functional role of
the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning.
The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward
prediction error similar to the computational TD-error, whose function is to guide an animal to
maximize future rewards; the other suggests that phasic dopamine is a sensory prediction error signal
that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis
that integrates the two contrasting positions: according to our view phasic dopamine represents a Tdlike reinforcement prediction error learning signal determined by both biological rewards (permanent,
extrinsic reinforcements) and unexpected changes in the environment (temporary, intrinsic
reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and
acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we
perform a series of experiments with a simulated robotic system that has to learn different skills in
order to get rewards. We compare different versions of the system in which we vary the composition of
the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic
reinforcements is able to reach high performance in sufficiently complex conditions.


Reference

Mirolli, M., Santucci, V. & Baldassarre, G. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study. Submitted to Neural Networks.



*******************************************************************************************

October 25 (Thursday, 14:30-15:30)

Fabrizio Taffoni,

Università Campus Biomedico di Roma

Title: A mechatronic Platform for empirical experiments on intrinsic motivations and skill acquisition. Experiments with children

 

Abstract

The autonomous acquisition of new skills and knowledge is one of the most astonishing capacities that can be observed in humans and animal models. The driving force that shapes this process is unknown. Children seem to acquire new skills and know-how in a continuous and open-ended manner. This process follows a well-defined path strictly linked to the development of cognitive and morphological structures, which are related to the new acquired skills (e.g. tool use). How children learn to use these skills in a different context to reach a specific goal is unknown. To study which is the driving force that shape exploratory behaviors underling learning processes in humans, we designed a new mechatronic tool for behavioral analysis (called “mechatronic board”). The new platform should allow to test if exploratory actions, which are not instrumental to achieve any specific goal, improve participants capacity in solving a subsequent goal-directed task, which requires the proficiency acquired during free exploration. In this seminar the experimental protocol with children and the preliminary results will be presented and discussed.

 

Reference:

Taffoni, F.; Vespignani, M.; Formica, D.; Cavallo, G.; Polizzi di Sorrentino, E.; Sabbatini, G.; Truppa, V.; Visalberghi, E.; Mirolli, M.; Baldassarre, G.; Keller, F. & Guglielmelli, E. (2012). A mechatronic platform for behavioural analysis of non human primates. Journal of Integrative Neuroscience, 11(1), 87-101.



*******************************************************************************************

October 30 (Tuesday, 14:30-15:30) (Seminar out of the series, on extrinsic motivations)

Vincenzo G. Fiore

CNR-ISTC-LOCEN

Title: Corticolimbic catecholamines in stress: A computational model

 

Abstract

The brain determines what is stressful and on this basis regulates physiological and behavioural adaptive responses. Converging evidence ascribes a major role to catecholamines in these processes: in particular, data show that norepinephrine (NE) and dopamine (DA) outflows in the ventromedial prefrontal cortex (vmPFC) regulate DA outflow in the nucleus accumbens (NAcc). The paper presents a systemic model capturing the brain mechanisms underlying such neuromodulations and explaining how they contribute to translate the appraisal of the stressful experience into the motivational state required to deal with it. The model proposes three key hypotheses: (a) NE allows prelimbic cortex (PL) to guide active coping strategies and energizes the resulting responses by enhancing NAcc DA outflow; (b) DA allows infralimbic cortex (IL) to block active coping attempts, when these are unsuccessful, by decreasing NAcc DA levels below the baseline; (c) learning processes involving IL and PL lead to the transition between coping strategies. The model, whose architecture relies on known connectivity and functions of the brain areas involved, represents the first integrated operational explanation of the investigated phenomena, successfully reproduces the fluctuations of catecholamines in stressing conditions, and produces a number of predictions some of which are tested here with new experiments.



*******************************************************************************************

October 31 (Wednesday, 14:30-15:30)

Eugenia Polizzi di Sorrentino

CNR-ISTC-UCP

Title: Intrinsic motivation: experiments with capuchin monkeys


Abstract
Animals (as well as humans) act as they are endowed with complex motivational systems that drive them to do so. It has been suggested that intrinsic motivation, described as a drive that leads exploratory actions "for their own sake", may play an important role in the acquisition of knowledge and skills that may be recalled in a later stage, so as to obtain valuable outcomes. Previous studies on nonhuman primates have brought circumstantial evidence of the role of intrinsic motivation in promoting exploratory actions in these taxa, however this issue is still under debate. Here we provide the results of several experiments aimed at understanding whether spontaneous exploration drives versatile learning in a primate species well known for being "curious" and highly manipulative, the capuchin monkey (Cebus apella). The experiments were run with the mechatronic board, an innovative device specifically designed to allow inter-species comparative research on monkeys, children (and robots) as part of the European project IM-CLeVeR (FP7-ICT-IP-231722).



*******************************************************************************************

November 12 (Monday, 15.30-16.30)

Kevin Gurney

Department of psychology, University of Sheffield, UK

Title: Developing new actions in humans and robots


Abstract: Goal directed behaviour requires that we know how to deploy actions in an efficient way to bring about some predictable change in the world. In many cases we learn these actions, and their associated outcomes, by trial and error in an intrinsically motivated way with no overt, immediate reward. But what are the neural mechanisms that support such learning? Work in operant conditioning has provided evidence to suggest that a set of subcortical nuclei - the basal ganglia - may implement a form of reinforcement learning algorithm, with phasic activity in midbrain dopaminergic neurons constituting a reinforcement signal. In this context, the phasic dopamine signal is often deemed to be a *reward* prediction error. In contrast, we have proposed that phasic dopamine release is better described as a *sensory* prediction error, whose function is to support the kind of intrinsically motivated action-outcome discovery described above. In this talk, I will describe our theory of action discovery, and its testing in computational models at both the behavioural level (in an autonomous agent) and at the synaptic level where it is supported by recent in vitro data.

 

Reference: Redgrave, P., Gurney, K. (2006). The short-latency dopamine signal: a role in discovering novel actions? Nature Review Neuroscience, 7 (12), 967-975.

Gurney, K.; Prescott, T. J.; Wickens, J. R. & Redgrave, P. (2004). Computational models of the basal ganglia: from robots to membranes.. Trends in Neurosciences, 27 (8), 453-459.



*******************************************************************************************

November 14 (Wednesday, 14:30-15:30)

Francesco Mannella

CNR-ISTC-LOCEN

TitleIntrinsically motivated action-outcome learning and goal-based action recall: A system-level bio-constrained computational model


Abstract

Reinforcement (trial-and-error) learning in animals is driven by a multitude of processes. Most animals have evolved several sophisticated systems of `extrinsic motivations' (EMs) that guide them to acquire behaviours allowing them to maintain their bodies, defend against threat, and reproduce. Animals have also evolved various systems of `intrinsic motivations' (IMs) that allow them to acquire actions in the absence of extrinsic rewards. These actions are used later to pursue such rewards when they become available. Intrinsic motivation has been studied in Psychology for many decades and its biological substrate is now being elucidated by neuroscientists. In the last two decades, investigators in computational modelling, robotics and machine learning have proposed various mechanisms that capture certain aspects of Ims. However, we still lack models of IMs that attempt to integrate all key aspects of intrinsically motivated learning and behaviour while taking into account the relevant neurobiological constraints. This paper proposes a bio-constrained system-level model that contributes a major step towards this integration. The model focusses on three processes related to IMs and on the neural mechanisms underlying them: (a) the acquisition of action-outcome associations (internal models of the agent-environment interaction) driven by phasic dopamine signals caused by sudden, unexpected changes in the environment; (b) the transient focussing of visual gaze and actions on salient portions of the environment; (c) the subsequent recall of actions to pursue extrinsic rewards based on goal-directed reactivation of the representations of their outcomes. The tests of the model, including a series of selective lesions, show how the focussing processes lead to a faster learning of action-outcome associations, and how these associations can be recruited for accomplishing goal-directed behaviours. The model, together with the background knowledge reviewed in the paper, represents a framework that can be used to guide the design and interpretation of empirical experiments on IMs, and to computationally validate and further develop theories on them.


References

Baldassarre, G., Mannella, F., Fiore, V. G., Redgrave, P., Gurney, K., & Mirolli, M. (2012). Intrinsically motivated action-outcome learning and goal-based action recall: A system-level bio-constrained computational model. In press in Neural Networks.



*******************************************************************************************

November 22 (Thursday, 14:30-15:30)

Bruno Castro da Silva

Department of Computer Science, University of Massachusetts - Amherst

Title: Learning Parameterized Skills


Abstract
We introduce a method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems. The method draws example tasks from a distribution of interest and uses the corresponding learned policies to estimate the topology of the lower-dimensional piecewise-smooth manifold on which the skill policies lie. This manifold models how policy parameters change as task parameters vary. The method identifies the number of charts that compose the manifold and then applies nonlinear regression in each chart to construct a parameterized skill by predicting policy parameters from task parameters. We evaluate our method on an underactuated simulated robotic arm tasked with learning to accurately throw darts at a parameterized target location.

 

Reference

Castro da Silva, B.; Konidaris, G.; Barto, A.G. Learning Parameterized Skills. Proceedings of the 29th International Conference on Machine Learning (ICML 2012). Scotland, 2012.



*******************************************************************************************

November 28 (Wednesday, 14:30-15:30)

Valerio Sperati

CNR-ISTC-LOCEN

Title: A Bio-Inspired Attention Model of Anticipation in Gaze Contingency Experiments with Infants


Abstract
The empirical investigation of cognitive processes in infants is hampered by the limited development of their body and motor control skills. Novel gaze-contingent paradigms, where infants control their environment with their eyes, offer a solution to this problem since the control of gaze develops very early. Recently such a paradigm has been used to investigate the learning of anticipatory behaviour of 6/8-months-old infants in an experiment where gazing at a “button” causes the appearance of an interesting image. The experiment shows that infants learn in few trials to “press the button” with their gaze and, remarkably, to anticipate the image appearance. The experiment raises two questions: (a) what are the processes supporting such fast learning? (b) why does the anticipatory behaviour emerge? This paper presents a bio-inspired model that puts forward possible answers to these questions. The model consists of: (a) a bottom-up attention component for scene saliency detection; (b) a top-down attention component that uses reinforcement learning (RL) to learn to control gaze based on current foveal stimuli; (c) a dynamic field map that decides where to look based on the bottom-up and top-down information. The results show that the model is indeed capable of learning to perform anticipatory saccades in few trials. The analysis of the system shows that this fast learning is due to the guidance exerted by the bottom-up component on the learning process of the top-down component. Moreover, it shows that the anticipatory behaviour can be explained in terms of the action of the top-down component exploiting the learned spatial relations of stimuli in an anticipatory fashion. Overall, the results indicate that the model has the essential features to theoretically frame, interpret, and design experiments based on the gaze-contingent paradigm.


References
Marraffa, R., Sperati, V., Caligiore, D., Triesh, J., Baldassarre, G. (in preparation). A Bio-Inspired Attention Model of Anticipation in Gaze Contingency Experiments with Infants. Accepted to the IEEE International Conference of Development and Learning (ICDL 2012).



*******************************************************************************************

December 6 (Thursday14:30-15:30)

Daniele Caligiore

CNR-ISTC-LOCEN

Title: Development of reaching in infants: a computational model

 

Abstract

Despite the huge literature on reaching models there is still not a clear hypothesis about the motor control mechanisms underlying the development of reaching skill in infant. This article contributes to solving this issue by proposing a computational model based on the integration of three key hypothesis: (a) the trial-and-error reinforcement learning process guides the reaching development process; (b) the control of the movement on the basis of equilibrium points allows the system to quickly find initial approximate motor solutions; (c) the request of accuracy of the end-movement in the presence of muscular noise drives the successive progressive refinement of the reaching movement. The tests of the model, based on a simulated robot, show that the integration of these key hypothesis allows the model to reproduce several empirical findings, most deriving from longitudinal studies with real subjects, related to several motor control issues (development of speed-accuracy trade-off; movement organization by submovements; developing of a bell-shaped speed profile; early managing of redundant degrees of freedom). Importantly, most of those empirical data have never been studied by previous computational models. The model also produces testable predictions on such issues. Interestingly, the analysis of the model functioning reveals that all these results are ultimately explained by the same developmental strategy emerged in the simulations: the model first quickly learns to perform coarse movements that allow the hand to contact the target, and then progressively refines such movements to increase accuracy.



*******************************************************************************************

December 13 (Thursday, 14:30-15:30)

Marco Mirolli

CNR-ISTC-LOCEN

Title: Intrinsic Motivation Reconsidered


Abstract
The concept of intrinsic motivation has been introduced in animal psychology in the 1950s to account for animal behavior that could not be explained by classical drive theory. Then the concept has been applied in human psychology in relation to the finding that rewarding an activity may undermine the motivation to pursue that activity. Nowadays the concept of intrinsic motivation is widely used in machine learning and developmental robotic research as a means for developing autonomous and cumulative learning artificial system. In all this disciplines the concept is used slightly differently and for different purposes. Beyond reviewing the works that go under the umbrella of 'intrinsic motivation' in psychology and robotics, I will also discuss neuroscientific work that is relevant for understanding the neural basis of intrinsic motivations and I will propose a theory about: what intrinsic motivations are, why they are there, and how they determine animal behavior.



*******************************************************************************************

December 19 (Wednesday, 14:30-15:30))

Paolo Tommasino

NAYANG Technological University, Singapore (formerly CNR-ISTC-LOCEN)

Title: Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks

 

Abstract

Children are capable of acquiring a large repertoire of motor skills and of efficiently adapting them to novel conditions. In previous work we proposed a hierarchical modular reinforcement learning model (RANK) that can learn multiple motor skills in continuous action and state spaces. The model is based on a development of the mixture-of-expert model that has been suitably developed to work with reinforcement learning. In particular, the model uses a high-level gating network for assigning responsibilities for acting and for learning to a set of low-level expert networks. The model was also developed with the goal of exploiting the Piagetian mechanisms of assimilation and accommodation to support learning of multiple tasks. This paper proposes a new model (TERL - Transfer expert reinforcement learning) that substantially improves RANK. The key difference with respect to the previous model is the decoupling of the mechanisms that generate the responsibility signals of experts

for learning and for control. This led made possible to satisfy different constraints for functioning and for learning. We test both the TERL and the RANK models with a two-DOFs dynamic arm engaged in solving multiple reaching tasks, and compare the two with a simple, flat reinforcing learning model. The results show that both models are capable of exploiting assimilation and accommodation processes in order to transfer knowledge between similar tasks, and at the same time to avoid catastrophic interference. Furthermore, the TERL model is shown to significantly outperform also the RANK model thanks to its faster and more stable specialization of experts.

 

Reference

Tommasino, P., Caligiore, D., Mirolli, M., & Baldassarre, G. (2012). Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks. InProceedings of Conference on Developmental Learning and Epigenetic Robotics (ICDL-EPIROB), 2012.