Skip to content. | Skip to navigation

Personal tools
Intrinsically Motivated
Cumulative Learning
Versatile Robots
Document Actions




 The calendar of the lectures is here


What are intrinsic motivations? A biological and computational perspective

Gianluca Baldassarre (CNR-ISTC-LOCEN, Italian National Research Council, Institute of Cognitive Sciences and Technologies, Laboratory of Computational Embodied Neuroscience, Italy) 

The concept of “intrinsic motivation”, initially proposed and developed within psychology, is gaining an increasing attention within cognitive sciences for its potential to produce open-ended learning machines and robots. However, a clear definition of the phenomenon is not yet available. This presentation aims to clarify what intrinsic motivations are from a biological perspective and from a computational perspective. To this purpose, it first shows how intrinsic motivations can be defined contrasting them to extrinsic motivations from an evolutionary (and engineering) perspective: whereas extrinsic motivations guide learning of behaviours that directly increase fitness (or satisfy the user/designer purposes), intrinsic motivations drive the acquisition of knowledge and skills that contribute to produce behaviours that increase fitness (or user satisfaction) only in a later stage. Given this key difference, extrinsic motivations generate learning signals on the basis of events involving body homeostatic regulations (accomplishment of user purposes), whereas intrinsic motivations generate transient learning signals mainly based on events taking place within the brain itself (or within the controller of the robot/intelligent machine). These ideas are supported by presenting (preliminary) taxonomies and examples of biological mechanisms underlying the two types of motivations, and also by linking them to some of the most commonly used mechanisms proposed by the literature to implement intrinsic motivations in robots and machines.


Key Reference

Baldassarre G. (2011) What are intrinsic motivations? A biological perspective. In Proceedings of Conference on Developmental Learning and Epigenetic Robotics (ICDL-EPIROB), Frankfurt am Main, Germany, 24-27 August 2011, pp. E1 - 8.



1996: BA and MSc in Economics

1996-1997: MSc in Cognitive Psychology and Neural Network Modelling

1997-1998: Research Assistant, Institute of Cognitive Sciences and Technologies, Italian National Research Council (ISTC-CNR), Rome, Italy

1998-2001: PhD in Computer Science (on Planning with NN and RL), University of Essex, UK

2001-2005: Postdoc at ISTC-CNR (Swarm robotics, NN, RL)

2006-now: Researcher at ISTC-CNR (Topics: see below)

2006-now: Founder and Principal Investigator of the Research Group ''Laboratory of computational Embodied Neuroscience'' (LOCEN-ISTC-CNR)

2006-2009: Principal investigator for ISTC-CNR of EU Integrated Project ''ICEA – Integrating Cognition, Emotion and Autonomy''

2009-2013: Coordinator of EU Integrated Project IM-CLeVeR – Intrinsically Motivated Cumulative Learning Versatile Robots

1998-now: About 100 international peer review publications with bio-constrained computational models of brain and behaviour, developmental robotics, and artificial life

General research goals: 1. Investigating how brain learns and generates behaviour by interacting with the body and with the environment, based on computational system-level models. 2. Designing Developmental Robotics models for cumulative learning of acquisition of multiple sensorimotor skills, driven by intrinsic and extrinsic motivations and based on hierarchical reinforcement learning architectures.    

Specific research topics: 1.1 Intrinsic motivations (e.g., based on superior colliculus and hippocampus) 1.2 Extrinsic motivations (e.g., based on amygdala, hypothalamus, and nucleus accumbens) 1.3 The hierarchical organisation of sensorimotor behavior (e.g., based on basal ganglia, prefrontal cortex, and pre-motor/motor cortex). 1.4 Biologically plausible learning processes (trial and error learning, unsupervised learning, Hebbian learning). 2.1 Developmental robotic models (simulated and real iCub) of attention and motor control. 2.2 Mechanisms to implement intrinsic motivations in robots. 2.3 Hierarchical RL architectures for transfer and cumulative learning.



Information driven self-organization of complex robotic systems 

Ralf Der (Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany)

In recent years, information theory has come into the focus of researchers interested in the sensorimotor dynamics of both robots and living beings. One root for these approaches is the idea that living beings are information processing systems and that the optimization of these processes should be an evolutionary advantage. The talk considers predictive information  PI)as the guiding intrinsic motivation for development and shows how increasing the PI  may serve as a drive for an open ended, self-determined development of a robot. Interestingly, this very general setting can be translated into an explicit, Hebb-like learning rule. The effect of that rule is studied in a variety of embodied robotic systems starting with  the barrel and the so called armband robot where the emergence of a collective dynamics under decentralized control can be observed. In more complex robots like our hexapod, the humanid, or snake robots we observe the emergence of basic behavior skills in a playful self-exploration driven by PI maximization. We will also demonstrate how the emerging behavior patterns may be helpful in facing the curse of dimension for reinforcement learning scenarios. As a new development, we can identify  spontaneous symmetry breaking as a basic mechanism explaining the emergence of complex, embodiment-related  behavior patterns "out of nothing". Videos and more information at



2007 - Affiliated Scientist at the Max Planck Institute of Mathematics in the Sciences, Leipzig. Topics of interest: embodied intelligence, self-learning and self-organizing  robotic systems, information theory, dynamical systems, and intrinsic motivation.

2007 Retired

2000 - Professor, University of Leipzig.

1989 - 2000 Senior researcher (Privatdozent) at University of Leipzig, Institute of Computer Science, Dept.of Intelligent Systems. Head of several research projects on robotics, learning, neural networks. 

1971 - 1989 Researcher/Senior Researcher at the Academy of Sciences, Central Institute of Isotope and Radiation Research, Leipzig. Working in statistical physics and irreversible thermodynamics, the statistical theory of chemical reactions, the theory of self-organization and others.

1984 Thesis on "Statistical physics and irreversible thermodynamics - A new approach to the general theory of transport" (Habilitation, Dr. rer. nat. habil.)

1979/80 Member of the 24. Soviet antarctic expedition. Glaciological and geophysical research in the Queens Maud Land.

1971 PhD. Thesis: "Quantum mechanical many particle theory of nuclear reactions" (Dissertation Dr. rer. nat.)

1968 - 1971 Research Assistant, University of Leipzig.



Kail Frank (IDSIA, Lugano, Switzerland)

Many neuroscientists and machine learning researchers, despite being very CLeVeR, are not robotics experts. Consequently, when confronted with a complex piece of hardware such as the iCub humanoid, they choose the “path of least resistance” and control it with simple, discrete commands like “go to position” and “wait until position reached”. This simplistic control paradigm produces jerky, stop-and-go motions, reminiscent of 1980‘s industrial robots. Such simplistic control limits the scope of tasks the  robot can possibly achieve and is in many ways counterproductive with respect to the goal of communicating the efficacy of bio/neuro inspired approaches to robot control. This tutorial aims to get you thinking about alternative approaches to control, which are more advanced and more powerful than the above goto/wait paradigm. It will be presented in two 40 minute “chapters,” which can be summarized according to the following puns.

Chapter 1: Fundamentals of Control Theory

    I see damped oscillators. They’re everywhere!

    Who’s down with PID? Your ro-bot ba-by!

    We hold these truths to be self-evident, that position, velocity, and acceleration are not created equal.


Chapter 2: Contemporary Planning/Control Solutions

    A Brief History of Operational Space

    Joint Space is Fat Land - A romance of even more dimensions

    Attractor Dynamics - How to play the (potential) field


Neural Mechanisms of Standard and Hierarchical Reinforcement Learning

Clay Holroyd (University of Victoria, Canada)

This presentation will introduce current trends on the cognitive neuroscience of reinforcement learning. I will begin by presenting evidence for the “standard model” of reinforcement learning, which holds that the firing rate of midbrain dopamine neurons encodes temporal difference errors and that the basal ganglia – a major target of the dopamine system – implements an actor/critic architecture. I will then discuss recent efforts to extend this model to account for hierarchical behavior, with an emphasis on how neural mechanisms for hierarchical reinforcement learning can ameliorate the scaling problem. This discussion will center on a recent proposal that anterior cingulate cortex is responsible for option selection and maintenance. I will then illustrate this proposal with a simulation of choice behavior of rats engaged in an effort-version of a T-maze task, wherein the basal ganglia select primitive actions  (i.e., take a step north, south, east, or west in the maze, or sit) according to standard principles of reinforcement learning, and anterior cingulate cortex biases action selection by choosing behaviors at a higher level of temporal abstraction (i.e., move down the left vs. the right arms of the maze). Finally, I will present the results of a series of event-related brain potential experiments conducted in my laboratory that indicate how anterior cingulate cortex utilizes dopamine temporal difference error signals for the purpose of adaptive decision making. 



 I am an Associate Professor of Psychology and a Canada Research Chair in Cognitive Neuroscience  at the University of Victoria, Canada. My primary scientific interest concerns the cognitive neuroscience of cognitive control and decision making. I am particularly interested in how the brain selects and sustains extended sequences of behavior, for example, how do we decide to jog up a steep mountain and then actually follow through with this decision rather than to stay at home and watch TV? My research points to anterior cingulate cortex and the midbrain dopamine system as a critical neural interface supporting such behavior, which I try to understand using a formal theoretical approach called hierarchical reinforcement learning. Other related interests concern the neural mechanisms of substance dependence and attention-deficit hyperactivity disorder, both disorders of cognitive control and reward processing. I have also developed recent interests in spatial processing and navigation (parahippocampus), attention (locus coeruleus) and continuous motor control (posterior parietal cortex). To get at these issues I follow a "converging methods" approach involving electroencephalography, functional magnetic resonance imaging, genetics and computational modelling.


Developmental Mechanisms for Autonomous Life-Long Learning in Robots

Pierre-Yves Oudeyer (INRIA, France)

Developmental robotics studies and experiments mechanisms for autonomous life-long learning of skills in robots and humans. One of the crucial challenges is due to the sharp contrast between the high-dimensionality of their sensorimotor space and the limited number of physical experiments they can make within their life-time. This also includes the capability to adapt skills to changing environments or to novel tasks. To achieve efficient life-long learning in such complex spaces, humans benefit from various interacting developmental mechanisms which generally structure exploration from simple learning situations to more complex ones. I will present recent research in developmental robotics that has studied several ways to transpose these developmental learning mechanisms to robots. In particular, I will present and discuss computational mechanisms of intrinsically motivated active learning, which automatically select training examples [5,4], or tasks through goal babbling [2], of increasing complexity, and their interaction with imitation learning [3], as well as maturation and body growth where the number of sensori and motor degrees-of-freedom evolve through phases of freezing and freeing [1,6]. I will discuss them both from the point of view of modeling sensorimotor and cognitive development in infants and from the point of view of technology, i.e. how to build robots capable to learn efficiently in high-dimensional sensorimotor spaces.


[1] Baranes A., Oudeyer P-Y. (2011) The interaction of maturational con- straints and intrinsic motivations in active motor development, in Pro- ceedings of ICDL-EpiRob 2011.

[2] Baranes, A., Oudeyer, P-Y. (2013) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems, 61(1), pp. 49-73.

[3] Nguyen M., Baranes A. and P-Y. Oudeyer (2011) Bootstrapping intrinsically motivated learning with human demonstrations, in proceedings of the IEEE International Conference on Development and Learning, Frank- furt, Germany.

[4] Oudeyer P-Y. Kaplan F. and V. Hafner (2007) Intrinsic motivation systems for autonomous mental development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265–286.

[5] Schmidhuber, J. (1991) Curious model-building control systems, in: Proc. Int. Joint Conf. Neural Netw., volume 2, pp. 1458–1463.

[6] Stulp F., Oudeyer P-Y. (2012) Emergent Proximo-Distal Maturation with Adaptive Exploration, in Proceedings of IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-Epirob), San Diego, USA.

Dr. Pierre-Yves Oudeyer is Research Director at Inria and head of the Inria and Ensta-ParisTech FLOWERS team (France). Before, he has been a permanent researcher in Sony Computer Science Laboratory for 8 years (1999-2007). He studied theoretical computer science at Ecole Normale Supérieure in Lyon, and received his Ph.D. degree in artificial intelligence from the University Paris VI, France. After working on computational models of language evolution, he is now working on developmental and social robotics, focusing on sensorimotor development, language acquisition and life-long learning in robots. Strongly inspired by infant development, the mechanisms he studies include artificial curiosity, intrinsic motivation, the role of morphology in learning motor control, human-robot interfaces, joint attention and joint intentional understanding, and imitation learning. He has published a book, more than 80 papers in international journals and conferences, holds 8 patents, gave several invited keynote lectures in international conferences, and received several prizes for his work in developmental robotics and on the origins of language. In particular, he is laureate of the ERC Starting Grant EXPLORERS. He is editor of the IEEE CIS Newsletter on Autonomous Mental Development, and associate editor of IEEE Transactions on Autonomous Mental Development, Frontiers in Neurorobotics, and of the International Journal of Social Robotics. He is also working actively for the diffusion of science towards the general public. Web: and

Computational models of action discovery in animals 

 Kevin Gurney (University of Sheffield, UK)

How can animals acquire a repertoire of actions enabling the achievement of their goals? Moreover, how can this be done spontaneously without the animal being instructed, or without having some overt, primary reward assigned to successful learning?  The relation between actions internal models, encoded in associative neural networks. In order for these associations to be learned, representations of the motor action, sensory context, and the sensory outcome must be repeatedly activated in the relevant neural systems. This requires a transient change in the action selection policy of the agent, so that the to-be-learned action is selected more often than other competing actions. A programem of work seeking the biological underpinning of this computational framework will therefore require an understanding of action selection in the brain. A key component in this scheme is a set of sub-cortical nuclei - the basal ganglia. There is evidence to suggest the basal ganglia may be subject to reinforcement learning, with phasic activity in midbrain dopamine neurons constituting a reinforcement signal. We propose that this signal encodes a sensory prediction error, thereby suggesting how learning may be intrinsically
motivated by exploration of the environment. I will describe some high level models of basal ganglia, their use in seeking an understanding of intrinsically motivated learning, and the development of reinforcement learning rules grounded at the lower level of synaptic physiology and spike timing.
I have a first degree was in Mathematical Physics, a Masters in Digital Systems and a PhD in engineering neural networks. During my postdoc I encountered the messy world of biological data for the first time with work in an experimental psychology lab doing psychophysics of visual motion detection. During this time I also started my career in computational neuroscience building models of visual processing constrained by the data we gathered in the lab. After this I secured a permanent position at Sheffield and became drawn into the (then unfashionable) world of basal ganglia modelling. One of the hall-marks of my work is that it has covered many levels of description – from single neuron models, through spiking neuron models of microcircuits, to high level models of entire brain systems. As a result, I have become interested in the methodologies in computational neuroscience - how should we go about doing it given we can build models at so many levels of description? 
Info Eating Info Eating   323.7 kB
Calendar Calendar   10.1 kB
Triesch Triesch   1011.9 kB
Baldassarre Baldassarre   375.7 kB
Kail Frank Kail Frank   10.3 MB
Oudeyer Oudeyer   23.1 MB
Metta Metta   3.5 MB
Redgrave Redgrave   14.0 MB
Asada Asada   16.6 MB
Barto Barto   4.3 MB