Skip to content. | Skip to navigation

Personal tools
Intrinsically Motivated
Cumulative Learning
Versatile Robots
Document Actions


Students will work on one of the following projects


Schemas in networks and symbols: hybrid models of action sequencing and execution


supervisors: Kevin Gurney, Mike Sheldon, Jen Lewis

students: Christian I. Penaloza, Jihoon Park, Björn Weghenkel

How do we combine high level planning and knowledge about sequential tasks with low level motor skills? Gven a task sequence, how do we select the right combinations of objects to attend to, and actions to operate on them at each stage in a task? These are questions that will be explored in this project using a union of  two different, but complementary, approaches. At Sheffield, we have been working on biologically plausible  neural network models which use a set of sub-cortical nuclei - the basal ganglia - to select actions. In the brain, it appears that there are several major 'territories' within basal ganglia, dealing with different aspects of action selection. In our models we have used two such areas dealing respectively  with higher level task knowledge in prefrontal cortex (PFC) , and low level motor action selection with motor cortex.  We have successfully shown how these two circuits can talk to each other and perform sequential tasks (such as making tea or coffee). However, establishing specific task knowledge in PFC is  a demanding task requiring hand-crafting of many connection weights in the PFC network. In contrast, at Aberystwyth, work has been done on action sequencing using regular 'schema' data structures and symbolic (programme based) processing. We believe that the representations of sequences in the PFC models at Sheffield share many  similarities with these schema. It should therefore be possible to substitute the symbolic schema representations of Aberystwyth in place of the complex recurrent neural networks of the PFC models at Sheffield. The advantage of this hybrid modelling approach will be the ability to rapidly prototype new task knowledge representations and take advantage of learning paradigms developed at Aberystwyth. The project will aim to build these models, tackling problems of the symbol/network interface, and deploy them on behavioural tasks involving pressing buttons and grasping objects, executed by a simple robot arm and hand in simulation. The technical requirements are some proficiency in python and matlab.


[1] PSchema: A developmental schema learning framework for embodied agents

Proceedings of the IEEE Joint International Conference on Development and Learning, and Epigenetic Robotics 2011. 



Intrinsic Motivations in Active Perception

supervisors: Bert Shi, Jochen Triesch, Constantin Rothkopf

students: Wasif Muhammad, Goren Gordon, Umberto Esposito, Chong Zhang, Luca Lonini, P. Chandrashekhariah, Jonathan Grizou

An important goal for perceptual systems is to encode sensory information efficiently. This so-called efficient coding hypothesis has given rise to a large body of work showing how biological sensory systems are adapated to encode naturally occurring stimuli near-optimally. Active perceptual systems will in addition shape their sensory inputs by virtue of movements such as eye movements or active whisking. These movements determine the sensory inputs and their statistics. We have recently proposed a general framework for learning efficient coding strategies in an active perception setting. Sensory information is encoded by a generative model that learns an efficient representation of sensory inputs. At the same time, appropriate sensory actions are learned via reinforcement learning, where the reinforcement learner is driven by an intrinsic reward signal favouring actions that allow the generative model to function most efficiently. Thus, the system is driven by an intrinsic motivation to encode sensory inputs as efficiently as possible given its limited representational capacity. In the context of binocular vision, it was recently shown that the approach leads to the discovery of binocular disparity and the development of accurate vergence eye movements [1]. Current work extends this approach to the development of motion sensitivity and the learning of smooth pursuit eye movements.

In this project, we will study and extend the framework along multiple directions. We will test the robustness of the approach to various perturbations and compare different generative models and reinforcement learning algorithms (bring your own code!). As research platforms we provide a simple Matlab environment with natural images from the van Hateren data base [2]. We will also work with a simulator of the humanoid robot iCub [3]. Background knowledge in reinforcement learning and generative models is required. Experience with the iCub simulator is desirable but not necessary.



[1] A Unified Model of the Joint Development of Disparity Selectivity and Vergence Control. Y. Zhao, C.A. Rothkopf, J. Triesch, B.E. Shi. IEEE Int. Conf. on Development and Learning (ICDL), 2012.

[2] Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex. J.H. van Hateren, A. van der Schaaf. Proc. R. Soc. B. 265(1394): 



Playful acquisition of basic behavioral skills


supervisors: Ralf Der and Georg Martius

students: Chrisantha Fernando, Amr Almaddah, Fabien Benureau, Jimmy Baraglia, Quan Wang

The independent, self-directed acquisition of behavioral skills is one of the main challenges in the autonomous development of robots. Nature proposes a solution: young children and higher animals learn to master their complex brain-body systems by playing. Can this be an option for robots? How can a machine be playful? In the project, the participants are encouraged to study recent work on this  the participants are encouraged to study recent work on this problem by conducting more than 30 experiments on playful development with embodied robots in physically realistic simulations. The theoretical background and the experiments are from the book "The Playful Machine" by Ralf Der and Georg Martius, see our website Based on both the homeokinetic principle and more recent, information theoretical work on intrinsic motivation, the experiments study the unfolding of complex behavioral patterns on first simple machines like the barrel bot and the so called armband robot to continue with robots of increasing complexity: a dog-like robot starts playing with a barrier, eventually jumping or climbing over it; a snakebot develops coiling and jumping modes; humanoids develop climbing behaviors when fallen into a pit, or engage in wrestling-like scenarios when encountering an opponent, see the videos on our website. Curious students can download the simulator already now from


Efficient exploaration of high-dimensional sensorimotor space

supervisors: Bert Shi, Jochen Triesch

students: Dimitrije Markovic, Hung Ngo, Christoph Hartmann, Vassilis Vassiliades, Simon Smith, Pedro Sequeiro, Thomas Reppert

Animals and Humans need to simultaneously control hundreds of muscles during everyday behaviors. Learning effective control strategies in such high-dimensional spaces is extremely challenging. If the state space is even moderatley high-dimensional, an agent may not be able to try out all theoretically possible body configurations during its life-time. Thus, a naive motor babbling, where the agent performs random motor commands and observes the sensory consequences in order to learn a model of its body, is hopelessly inefficient. A promising alternative to random motor babbling is the use of internally generated goals that the agent tries to achieve. Such goal babbling can dramatically improve learning efficiency [1,2], but it also raises the question how sequences of goals should be selected by the agent. This calls for intrinsic motivation mechanisms that can select appropriate goals based on prior knowledge and what has already been learned.

In this project, we will explore these issues in the context of reaching movements performed by a model of a muscle-actuated human arm. The model, developed by Chris Eliasmith, uses 3 links and 9 muscles [3]. We will examine the use of compact low-dimensional descriptions of the action space, e.g. using basis functions to represent temporal activation profiles of individual muscles or the use of motor primitives [4]. Goal-babbling will be used to learn appropriate basis function parameters for efficient reaching movements to specific target locations in space with different speeds.  An interesting question is whether this approach can lead to emergent maturation, where the control of more distal joints from the body emerges later than that of more proximal joints [5].  This challenging project amounts to original research and, if successful, has a good chance of leading to a publication.




[1] Rolf, M., Steil, J.J., Gienger, M. (2010). Goal Babbling Permits Direct Learning of Inverse Kinematics. IEEE Transactions on Autonomous Mental Development 2(3): 216 – 229.


[2] Rolf, M., Steil, J.J., Gienger, M. (2011). Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions. Presented at the 2011 IEEE International Conference on Development and Learning (ICDL),Frankfurt, Germany.




[4] Thomas, P. and Barto, A. (2012) Motor Primitive Discovery, presented at the 2012 IEEE International Conference on Development and Learning (ICDL), San Diego, CA.


[5] Stulp, F. and Oudeyer, P.-Y.. Emergent Maturation through Adaptive Exploration, presented at the 2012 IEEE International Conference on Development and Learning (ICDL), San Diego, CA.



Baby steps with the iCub Humanoid

Supervisors: Kail Frank, Celine Teuliere

Students: Fransiska Basoeki, Hamed Mahzoon, Simon Vogt

Throughout the week students will have the opportunity to undertake programming projects, working in groups, using the iCub simulator and (optionally) the MoBeE framework from IDSIA. Suggested topics for projects include:

Joint Space Trajectory Tracking (beginner - use the iCub’s position controller and gain firsthand experience dealing with practical issues)

Operational Space Force Control (beginner - use the MoBeE framework to explore the neighborhood of a particular pose)

Operational Space Trajectory Tracking (intermediate - use MoBeE’s force control and position control together)

Joint Space Velocity Control (advanced - build your own velocity controller)

At the end of the week, students will present a small demo of what they were able to accomplish, as well as a brief report on the challenges they encountered along the way and their thoughts on how to integrate learning and motor control.


Optimal decision making and intrinsically/extrinsically motivated learning

Supervisors: Nathan Lepora, Giovanni Pezzulo, Kevin Gurney

students: Rodolfo Marraffa, Alberto Testolin

 Understanding how the brain can make reliable perceptual judgements despite imperfect sensor information has become a central theme of neuroscience. Almost all theoretical accounts of this process involve the sequential integration of evidence that determines behaviour through boundary-crossing, e.g. [1]. For perceptual choices between two alternatives, the majority of experimental and theoretical work has focussed on cortical mechanisms for evidence integration and decision making [2]. For multiple alternatives, the basal ganglia have been proposed to function together with cortex to implement optimal decision making [3]. In particular, the hyper-direct pathway from cortex via the sub-thalamic nucleus seems consistent with normalizing cortical evidence necessary for a threshold-crossing decision rule over perceptual beliefs; meanwhile, the cortico-striatal pathways seem consistent with providing a learnt bias to the belief thresholds appropriate to the current task demands [4]. In this project, we will explore how optimal decision making theories of the cortex and basal ganglia relate to proposed mechanisms for intrinsically and extrinsically motivated learning. Topics of interest include: 1) Reinforcement learning of the decision thresholds to achieve optimal task performance; 2) Inclusion of intrinsically and extrinsically motivated task demands; and 3) Possible relations to goal-directed and habitual decision making [5]. Ideally, these theories will be applied to tactile perception on data pre-collected from the iCub robot [6], although there is also flexibility to carry out a more theoretical study if wished or to use other types of robot data. 


[1] O'Connell R, Dockree P and Kelly S (2012). A supramodal accumulation-to-bound signal that determines perceptual decisions in humans. Nature Neuroscience.

[2] Gold JI and Shadlen MN (2007). The Neural Basis of Decision Making. Annual Review of Neuroscience, Vol. 30: 535-574.

[3] Bogacz R and Gurney K (2007). The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Computation, 19(2), 442–477.

[4] Lepora N and Gurney K (2012). The basal ganglia optimize decision making over general perceptual hypotheses. Neural Computation 24 (11) 2924-2945.

[5] Solway, A and Botvinick, M (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psycholological Review, 119, 120-154. 

[6] Lepora N, Martinez U, Barron H, Evans M, Metta G and Prescott T (2012). Embodied hyperacuity from Bayesian perception: Shape and position discrimination with an iCub fingertip. IEEE proceedings of IROS, 4638-4643.