Skip to content. | Skip to navigation

Personal tools
Document Actions

CLEVER-B4

CLEVER-B4

CLEVER-B4

Description of the overall system

 

Background and open issues

An artificial cognitive system must have the ability to handle, and learn from, new experiences: it should seek out, and endeavour to comprehend, new information based on exploration guided by intrinsic motivations. In addition, cognitive growth requires a cumulative learning of the capacity to use the suitable skills in different contexts. Humans and primates both demonstrate these capabilities. This task explores the biological and psychological systems behind these capabilities, and generates computational models to reproduce the behaviours.  The aim is to create more effective cognitive artificial systems and more autonomous robots, and at the same time to furnish insights and theoretical tools to the neural and psychological investigation.

Specific goals

The task aims to build a robotic demonstrator which shows intrinsic motivation and cumulative learning based on biologically constrained mechanisms (see Baldassarre, Gurney, Lee, and Triesch, 2012, for a description of the target demonstrator). The architecture shown in the demonstrator integrates the psychologically-constrained sensorimotor mechanisms and actions of AU with the biologically-constrained decision-making components of CNR-ISTC-LOCEN/USFD and the psychologically/biologically constrained visual abstraction and attention mechanisms of FIAS.

The demonstrator aims to: (a) show that intrinsic motivations (IM) are critical for learning action-outcome contingencies (“agency”) later re-usable to achieve extrinsic rewards and goals; (b) understand the neural mechanisms supporting IM in animals, and to transfer the gather knowledge to robots.

Approach and methods

The task integrates the results obtained in WP3 (mechatronic board, board experiments with monkeys and children; USFD's neuroscientific theory and empirical experiments on intrinsically-motivated learning) with the biologically/psychologically constrained systems of WP4, 5 & 6. The task is also closely related to Task 7.10 via the theoretical developments from CLEVER-K on intrinsically motivated learning in autonomous agents.

To facilitate the integration between the involved Teams, the work on the CLEVER-B demonstrator has focussed on 3 components: (a) a Sensory Motor Component (SMC), under the responsibility of AU, in charge of implementing the eye and arm actions; (b) since this year, the system also incorporates a new component called VAM (Visual Attention Module), developed by FIAS; (b) a Decision Making Component (DMC), under the responsibility of CNR-ISTC-LOCEN and USFD, in charge of deciding the actions to be perform based on mechanisms putatively implemented by striato-cortical loops.

The three components works in an integrated fashion on the basis of an API designed in year 2 and developed in the following years. The API interfaces the SMC, the VAM component, the DMC, and YARP (for controlling the simulated and real iCub robot). Figure 7.4.1 gives a high-level view of the links between the three components. The SMC passes visual target information to the VAM to identify the nearest object and provide information for vergence. The SMC then triggers a saccade to the detected object, verging on the target with the aid of the VAM.  Once fixated, this is communicated back to the VAM which is then able to visually learn about the fixated object.  The SMC also receives instructions consisting of gaze and reach targets from the DMC, and on this basis performs gaze and reaching actions on the simulated/hardware iCub robot. The DMC functions on the basis of information received both from the SMC and the VAM module (for object-novelty).  The internal reports Shaw et. (2012), Chandrashekhariah (2012), and Manella and Sperati (2012) describe the technical details of the communication and integration between the three components.

cleverb41

Figure: Diagram showing the lines of communication between the individual modules in the CLEVER-B demonstrator.

Results

The API and the integration of the components illustrated above made a good job in supported an integrated functioning of the various components. The outcome of this will be illustrated with the Demonstrator of the Fourth Review Meeting.

Final outcome of the task: Based on the work carried out in the fours years, the integration infrastructure of CLEVER-B has managed to support the realization of a highly sophisticated cognitive architecture integrating the models of 4 different research Teams and organized in 3 modules. This has produced and will continue to produce a number of interesting results and papers. Indeed, the implemented system is probably the most sophisticated existing bio-constrained model of learning and acting based on intrinsic motivations.

The integration, however, has also highlighted some difficulties. One is related to the different modeling levels of the different Teams involved (e.g., mainly the behavioural level of AU and the neural level of CNR). From a more technical perspective, the learnt gaze and reach space has a limited coverage, limiting the possibilities to interact with the whole mechatronic board. This required some modifications to the interface to allow the SMC to gaze the full range of visual stimuli.

SMC: The sensorimotor component

The sensorimotor component of the CLEVER-B demonstrator handles the sensor and motor capabilities required for interaction with objects in the real world. This includes: directing gaze toward stimuli of interest; reaching to, or pointing at, stimuli; grasping and manipulating objects, and pressing buttons on the experimental board. Actions learnt by the SMC are made available for selection by the DMC (only the simpler actions are used in the demonstrator).

The target experiment for the CLEVER-B demonstrator requires the robot to look at the buttons and openings on the experimental board, and to push the buttons. This was achieved by the SMC during year 2, with extensions to enable simple reaching and grasping in year 3. The CLEVER-B4 scenario (Baldassarre et al. 2012) requires a richer repertoire of arm actions (e.g., reaching to any fixated point, reaching between locations without returning to the home position, pushing objects left/right at locations in space, etc.). Furthermore, the task requires that the learning processes should be similar to those observed in children. To this effect we have focussed, in year 4, on improving the reaching abilities of the iCub using our sensorimotor mappings and developmental framework.

Specific goals

To create a gaze control and a repertoire of actions usable by the DMC in the CLEVER-B4 demonstrator. The task developed the SMC also in other directions not directly needed by the demonstrator, but to carry out studies relevant for developmental robotics. In this respect the task aims also to demonstrate a robotic system capable of staged sensorimotor development by integrating the sensorimotor mappings of Task 4.4, the novelty driven mechanisms of Task 5.4, and the hierarchical structures of Task 6.2.

Approach and methods

Infants are vigorous in their exploration of objects, repeatedly banging surfaces and objects together. We also note they are compliant, and very robust, with the advantages of being able to heal. In comparison, the iCub is very fragile, and difficult to repair. This presents issues when trying to model infant learning, as we are restricted to using ‘safe’ actions and objects, which significantly constrains learning. To this end, we have constructed a flexible table for use by the robot, and use soft sponge objects instead of the solid board manipulators.

The iCub robot has a very small reach space, and can only just reach all the buttons of the experimental board with one arm when it is carefully positioned. To increase the reachable space, infants move at the torso (Berthier et a. 1999). We have used the LWPR algorithm, as described in Task 4.4 to learn how to move the torso to bring objects into the reachable space. To improve the accuracy of reaching, and to enable the arm to reach to any location, we have implemented a new vector-based sensorimotor mapping (Task 4.4), which enables visually guided reaching between points, or along trajectories. Both of these learning structures are related to our existing sensorimotor mappings, and can be related to biological mechanisms to some extent.

In (Lee et al. 2012) and (Law et al., submitted a), we describe how we have combined the various systems developed during this project to model stages in infant development, beginning at learning to saccade and ending with open-ended play behaviour, which incorporates torso movements and simple reaching. In (Law et al. Submitted b) we model, in more detail, the stages in the development of arm control, beginning with motor babbling in the fetus, and ending with visually-guided reaching. This work is particularly interesting in that it shows how simple behaviours may be used to learn data for bootstrapping learning of later, more complex behaviours.

Schema sample

Figure: A representation of schema learning, showing the robot a) learning about empty-hand grasping b) learning through play that grasping at an object location results in holding that object c) generalising how to grasp an object at any location (following some other successful grasps)

CLEVER-B3 Stack

Figure: The iCub robot performing a stacking task. This shows the robot after having learnt eye, head, torso rotation, and simple arm movements and performed some play-like schema learning, following the developmental trajectory observed in infants. The whole process took 2.5 hours. Scaffolding via language is being used to excite the robot to drop objects in order to build the stack.

Results

We have successfully integrated elements from Tasks 4.4, 5.4, and 6.2, delivered during the duration of the project. This has resulted in the iCub learning, using mechanisms based on neuroscience and developmental psychology, to coordinate eye, head, body, and arm movements with its visual and proprioceptive systems in a manner that models infant development up to the point of visually guided reaching. Furthermore, we have shown how play can be used to explore and combine these simple actions in a way that creates new goals and learning opportunities.

Advancement of the work and relation to other tasks

In Deliverable 7.3 we identified 3 levels of demonstration complexity which we would aim to achieve. The work described here partially fulfills the most advanced of these, by using multiple objects of different shapes and with multiple affordances. Whilst we have the basis to discovered and exploit these affordances in order to produce suitable reactions from the board, this has yet to be proven. The work also meets the objectives of reproducing data from developmental psychology and showing a robot that implements intrinsically motivated cumulative learning processes similar to those observed in monkeys and children.

This work has generated several novel algorithms and key computational insights: 1) We have shown our content-neutral and bio-constrained mapping framework to be fast, flexible, and well suited to the sensorimotor mapping problem across multiple modalities. 2) Our work on models of overlapping receptive fields has shown their advantages: a useful trade-off between speed of learning and accuracy, reducing the complexity of sensorimotor mapping problems, and providing a cleaner, more abstract computational model than neural models. 3) The use of psychologically-constrained developmental stages has enabled fast learning by reducing the complexity of the learning space. We have also shown that stages bootstrap learning of new skills in a hierarchical structure, with early, primitive, behaviours providing the starting point for learning later, more complex, behaviours. Furthermore, transitions between these stages can be triggered or can emerge, based on internal and external factors, with experience shaping learning behaviour. 4) Cephalocaudal, coarse-to-fine, and proximodistal learning directions are also important in reducing complexity: proximal joints have a larger effect on movement, and therefore in reducing error, than more distal joints; coarse resolution of sensorimotor spaces constrains attention/action to more general stimuli/behaviours before refinement; the cephalocaudal direction gradually increases the spheres of influence of the robot. 5) Finally, our very simple mechanisms for intrinsic motivation support motor babbling and play as essential processes for generating learning data without goals, and in forming new actions through exploration and experimentation.

A representation of schema learning, showing the robot a) learning about empty-hand grasping b) learning through play that grasping at an object location results in holding that object c) generalising how to grasp an object at any location (following some other successful grasps)

The iCub robot performing a stacking task. This shows the robot after having learnt eye, head, torso rotation, and simple arm movements and performed some play-like schema learning, following the developmental trajectory observed in infants. The whole process took 2.5 hours. Scaffolding via language is being used to excite the robot to drop objects in order to build the stack.

Selected Bibliography

  • Baldassarre, G., Gurney, K., Lee, M. and Triesch, J. (2012). Scenario of CLEVER-B4: additional document handed in to the Evaluators and PO.
  • Lee, M., Law, J., Shaw, P., Sheldon, M. (2012). An infant inspired model of reaching for a humanoid robot. Proceedings of the IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob 2012), San Diego, CA, Nov 2012.
  • Berthier, N. E., Clifton, R. K., McCall, D. D., Robin, D. J. (1999). Proximodistal structure of early reaching in human infants. Experimental Brain Research, vol. 127, pp. 259–269.
  • Law, J., Shaw, P., Lee, M., Sheldon, M. (Submitted a) From Saccades to Play: A model of Coordinated Reaching through Simulated Development on a Humanoid Robot. Transactions on Autonomous Mental Development.
  • Law, J., Earland, K., Shaw, P., Lee, M. (Submitted b). Robotic modelling of infant reach development. Submitted to IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob 2013). Osaka, 18-22 August, 2013.

Videos

This video shows the iCub learning hand-eye coordination in a single sitting. Initial arm control is learnt in simulation, and is quickly refined on the real robot during this session. The robot is placed under a series of constraints, preventing joint movement, that are released as learning progresses. These constraints generate a sequence of learning (eye-head-arm-torso) that reflect stages in human infant development.

Learning on the robot takes just over half an hour, and the reaching is bootstrapped with approximately 30 minutes of real-time learning in simulation. The result is a robot that can learn to gaze and reach to objects, even those just out of reach, from scratch in around one hour.

The video ends with a demonstration of the robot using the actions learnt to reach to various objects. The speed of the video in this second section is at 4x normal speed.

 

This video shows simulated reach learning for a humanoid robot in the later stages of arm control learning. Here, vision and motor babbling are being used to discover how joint movement effects the movement of the hand in space.

 

In this video, the VAM is identifying individual objects in the environment based on their features and distance from the robot. This is achieved by comparing the images provided by the two cameras on the robot. The robot then selects an object to gaze to and point at.

 

In a previous experiments the iCub discovered, through play-like behaviour, that it could press buttons (CLEVER-B2) and perform a pressing action whilst holding an object (CLEVER-B3). Here we show how these actions can be combined with the refined reaching behaviours to press buttons positioned in front of the robot.

 

VAM: The Visual Attention Module

The VAM is formed by 4 modules.  We now describe them at a general level and how they communicated with the SMC and DMC. The modules are described in detail in the thematic tasks under FIAS responsibility, in the previous Periodic Project Report and in the publications of the group.

Feature correspondence. The SMC selects a location in the image of the dominant eye to attend to and sends it to VAM. Based on the desired gaze location sent by the DMC, the VAM (a) selects the closest discernible feature (or median of a set of closest discernible features) to the target location to be used for further processing; (b) segments the objects in the scene based on x-y-disparity information of features from stereo images, chooses the segment on which the target location is falling and then calculates the centroid of the object to be used for further processing. It then attempts to locate the matching feature in the 2nd camera image, and sends the coordinates of the two points to the DMC.

Object learning/update. The VAM receives the 'saccade complete' signal from the SMC in order to perform object learning/update. Object that is in the focus (center of the image or any other location if provided) is segmented from the scene and learning starts. During learning the object is checked for its novelty and the confidence of this is computed. When an object that is already known/learnt is perceived its novelty is updated while an object that is new is perceived this is given a new object ID. All the related information about the object are sent back to DMC.

Object recognition. The VAM receives a 'check' command from the SMC in order to recognize an object with a given ID. It calculates and sends back the location of the query object. If the object does not exist in the scene, or it is not recognized, then value ‘-1’ is sent back.

Segmentation of the scene. VAM receives the 'segment' command from the SMC in order to segment the scene into different objects and  let SMC choose an object of interest. Features in the scene are classified into different groups (each representing an object) and VAM provides the corresponding object ID location to SMC.

Decision Making component

Background and open issues related to IM-CLeVeR: 

CLEVER-B demonstrators aim to reproduce the  board experiment run with monkeys and children (see Figure 7.4.5 illustrating the set-up used), and on this basis to envisage new robotic challenges and to propose new architectures and algorithms to control autonomous robots. Specifically, according to the Work Plan the objective of CLEVER-B3 was to allow the system to assign value to goals based on the experience of food (all based on the capacity of amgydala to assign value to seen objects). The objective of CLEVER-B3 was instead to endow the system with the capacity of assigning value to goals based on the experience of food. The objective of CLEVER-B4 was instead to endow the system with the capacity to recall actions based on the activation of goals. CLEVER-B3 achieved both objectives, so anticipating the delivery of the results of this year. 

Although this positive accomplishment, CLEVER-B3 (Fiore et al., in preparation) was limited in that it could look and act upon only the 6 critical elements of the mechatronic board, the 3 buttons and the 3 boxes. Moreover, its object identification capabilities, its capacity to distinguish between novel and familiar objects, its capacity to detect phasic events (movements) were hardwired rather than based on actual visual perception, so the system was not capable of learning to apply its skills on objects different from the 3 buttons and boxes. CLEVER-B4 aimed to overcome all these limitations.

CLEVER-B4 is grounded on several previous published works that CNR carried out often jointly with other Teams of the Consortium (we report here only the new works published in the last reporting period):

  • Theoretical works on different types of intrinsic motivations (Baldassarre and Mirolli, 2013a, 2013b; Mirolli and Baldassarre, 2013; Barto et al., in preparation).
  • Works on CLEVER-B (CLEVER-B1: Chersi et al., 2012; CLEVER-B2: Baldassarre et al., 2012; CLEVER-B3: Fiore et al., in preparation).
  • Works on bottom-up and top-down attention (Marraffa et al., 2012; Ognibene et al., 2013; Ognibene et al., submitted).
  • Works on competence-based intrinsic motivations (Baldassarre and Mirolli, 2013c; Santucci et al., 2012a; Santucci et al., 2012b; Santucci et al., in press; Santucci et al., in preparation).
  • Works on intrinsic motivations based on prediction and their relation to extrinsic motivations (Mirolli et al., 2013; Santucci et al., submitted).
  • Works on the hierarchical organisation of action in brain (Baldassarre and Mirolli, 2013; Thill et all, 2013; Mannella et al., accepted; Caligiore et al., 2013).
  • Works on bottom-up and top-down attention (Marraffa et al., 2012; Ognibene, accepted; Ognibene, submitted).

Specific goals: 

The additional features of CLEVER-B4 aimed to endow it with the capacity to applying its capabilities to any object in any location of a continuous working space (see Baldassarre, G., Gurney, K., Lee, M. and Triesch, J. (2012). Scenario of CLEVER-B4: additional document handed in to the Evaluators and PO). Such capabilities are:

  • Visually explore the scene on the basis of the novelty of the seen objects.
  • Try the actions of the arm action repertoire on the foveated object.
  • When an unpredicted phasic event (i.e., a change of the environment) is caused by the arm actions, form a goal representation (i.e., the goal of causing such change) tied to the identity of the object where the change took place (i.e., the box that opened and got enlighten). Important: the object that reacted to action ''by moving'' (changing luminosity) can be located in any place of the continuous working space.
  • Learn the action-outcome contingencies related to those objects, i.e. learn the inverse models (i.e., the  links from the goal representation to the representations of actions that caused the outcome) on the basis of the fact that action caused the unpredicted phasic event.
  • In a later stage after learning: be capable of recalling such action-outcome contingencies based on the internal activation of goals, in our case caused by the sight of extrinsic rewards (e.g. a food represented by a coloured card placed in front of the object to which the goal is associated).

Importantly, note that:

  • Learning based on intrinsic motivations (IMs) related to unpredicted phasic events is based on the theories of USFD and is well captured by IM models like those of IDSIA.
  • The novelty based exploration is based on IMs related to novelty like those developed by UU.
  • The whole idea of encoding outcomes, and of recalling actions based on the internal re-activation of outcomes (goals) and on previously-learned inverse models, relies on the theories and models on competence-based intrinsic motivations of CNR-ISTC-LOCEN and CNR-ISTC-Barto.

This is important as it implies that CLEVER-B4 integrates the theories and models of 5 different Teams of the project.

Approach and methods: 

The accomplishment of the specific goals illustrated above required the development of several key aspects of the DMC (Figure 7.4.6). Now the key aspects of CLEVER-B4, that differentiate it from CLEVER-B3 (Fiore et al., in preparation), are listed in detail:

  • Full redesign and implementation of the software architecture.
    • Rationale: radical changes were needed with respect to last year.
  • Development of a biologically-plausible superior colliculus system for the bottom-up attention/eye component driving the eye on areas of the scene with high-contrasts and movement.
    • Rationale: allow the model to autonomously explore any location in the continuous working space.
  • Development of a top-down attention eye-control system capable of selecting gaze positions with an eye-centred reference frame (this was one of the biggest challenges with respect to CLEVER-B3). 
    • Rationale: the superior colliculus works with a relative reference frame so the top-down attention controller has to use an eye-centred reference frame in order to work in synergy with it; this is also biologically plausible.
  • Development of a goal-selection component capable of forming any number of goals associated to arbitrary objects.
    • Rationale: the scenario requests that in principle the system can learn any goal and any action-outcome contingency.
  • Development of a novelty map larger than the retina, and updated on the basis of gaze direction, corresponding to the hippocampus; this map is capable of driving the system to explore (with the eye, and hence with the arm actions given the strong coupling of these with the eye-gaze) the regions of space with high novelty. This map implements “UU-like” novelty-based intrinsic motivations.
    • Rationale: the system needs to remember the novelty associated with any region of the working space, and to be capable of returning to it when needed.
  • Design of a “USFD/IDSIA-like” new detector and a new predictor of salient events to implement the transient interest for unpredicted phasic events. The detection of phasic events is based on a second function played by the superior colliculus component (the other being the control of saccadic movements, see above). The predictor to inhibit dopamine production is based on a anticipatory inhibition of the learning dopamineric signal (like in TD models of dopamine).
    • Rationale: we intended to create a more biologically plausible event detector based on the superior colliculus capable of functioning on the basis of the iCub-grabbed images (last year this component was hardwired); the predictor stopping dopamine signals corresponds to one possible hypothesis on how the IM learning signal discussed here might vanish with repeated experiences.
  • Design of new basal ganglia-cortical loops encompassing not only the direct pathway (like in CLEVER-B3 year) but also the indirect pathway.
    • Rationale: having a more realistic model of basal-ganglia (like USFD models), and more power to face the challenges posed by the vision system based on real images. 

imcleverb4-2  imcleverb4-3

Figure. (Left) Simulated iCub and mechatronic board. (Right) Corresponding real set-up.

 

imcleverb4-4

Figure: The architecture of CLEVER-B4 DMC. Ctx: cortex; Th: thalamus; BG: basal-ganglia; PC/ITC: parietal cortex/inferotemporal cortex; SC: superior colliculus; VTA/SNpc: ventral tegmental area/substantia nigra pars compacta; DA: dopamine; CeA: central nucelus of amygdala; Hip: hippocampus; Put: putamen; Cau: caudatum; NAcc: nucleus accumbens; GPi: globus pallidus, internal part; GPi: globus pallidus, external part; STN: subthalamic nucleus; SNpr: subtantia nigra pars reticulata; PMC: premotor cortex; FEF: frontal eye fields; PFC: prefrontal cortex.

 

imcleverb4-6

Figure: Some results of the simulation of the system indicating its functioning. (Top) Snapshot of the board captured by one robot camera, and corresponding activation of the superior colliculus indicating the candidate saccade targets: the actual target (the central one marked with a red dot) is decided by the top-down disinhibition eye-loop that allows only one candidate saccade target proposed by the superior colliclus to be actually executed. (Bottom left) Absolute memory of novelty location implemented by hippocampus (case based on a gross-grained memory grid). (Bottom right) points on the board explored by the eye: small dots indicate gaze targets while larger squares indicate the centroids of the clusters of gaze (notice how foveated locations concentrate on the critical parts of the board).

 Results: 

The analysis of the functioning of the architecture and the results of its tests show that the model described has the following features:

  • It successfully dialogues and interact with the SMC and the VAM components
  • It can solve the challenging mechatronic-board scenario faced with the demonstrator both with the simulated and with the hardware mechatronic board/iCub.
  • In particular, the robot exhibits the capabilities of: (a) exploring novel objects unforeseen at design time; (b) forming inverse models based on unexpected action-outcomes; (c) recalling actions based on the activation of the related goals; (e) harmonising the several processes taking place in it.
  • It has a notable biological plausibility in terms of  functioning (a preliminary analysis shows that this matches various known physiological findings), that allow it to face the data on monkeys and children.

Functioning of the system. These functionalities are supported as follows by the architectural mechanisms described in the methods:

  • Initially the eye-loop explores the environment based on high-contrast portions of the visual scene.
  • Novel objects more strongly attract attention (novelty-based intrinsic motivation).
  • When the eye fixates a point in space, the arm-loop can trigger arm actions (this happens with higher chances for novel objects)
  • If no effect  happens, the explored object looses novelty and the system continues to explore other portions/objects of the scene
  • When an arm action causes a phasic event (luminance change: prediction-based intrinsic motivation) in any part of the scene, the agent: (a) looks at the location where the movement took place (based on a reflex of the superior colliculus); (b) forms a goal related to such object; (c) associates the goal to the arm action that caused the movement (inverse model) and the attentional location when the movement took place. This can engage the agent for several cycles until the predictor predicts the phasic event and this looses interest.
  • After the first training phase, any goal of the agent can be activated (e.g., by the sight of a red cardboard that represents a extrinsic reward such as a food) and as a consequence trigger the execution of the  related action that lead to its accomplishment.

Final outcome of the Task: 

We think this Task was fully successful as it managed to integrate a notable amount of knowledge produced in the four years of the project (see above the ''Background'' of this Section, 4/4, of the Task) incorporated into the models of four Teams (CNR, USFD, AU, FIAS). This integration led to several notable results:

  • It fostered several publications (see Selected Bibliography below), some specifically dedicated to the CLEVER-B models (one for each yearly demonstrator: CLEVER-B1: Chersi et al., 2012; CLEVER-B2: Baldassarre et al., 2012; CLEVER-B3: Fiore et al., in preparation; CLEVER-B4: Mannella et al., in preparation).
  • It showed how the investigation of brain and behaviour can greatly benefit of embodied modelling, as the one imposed by working with robots, as this can enlighten new problems and, at the same time, dissipate false ones. 
  • We think it firmly demonstrates how the knowledge gathered by implementing bio-constrained models can suggest innovative architectures and algorithms for robots: we have already shown this with several publications, but we think the bulk of the potential of CLEVER-B for doing this has yet to be fully exploited in the coming years!

Videos

Clever B4
 
 
Motor exploration guided by novelty detection
 
 

Selected Bibliography:

  • Baldassare, G.; Mannella, F.; Fiore, V. G.; Redgrave, P.; Gurney, K. & Mirolli, M. (2012), 'Intrinsically motivated action-outcome learning and goal-based action recall: A system-level bio-constrained computational model.', Neural Networks 41, 168-187.
  • Baldassarre, G. & Mirolli, M. (2013b), Intrinsically Motivated Learning Systems: An Overview, in Gianluca Baldassarre & Marco Mirolli, ed., 'Intrinsically Motivated Learning in Natural and Artificial Systems', Springer-Verlag, Berlin, pp. 1--14.
  • Baldassarre, G. & Mirolli, M. (2013c), Decidiing which skill to learn when: Temporal-Difference Competence-Based Intrinsic Motivation (TD-CB-IM), in Gianluca Baldassarre & Marco Mirolli, ed., 'Intrinsically Motivated Learning in Natural and Artificial Systems', Springer-Verlag, Berlin, pp. 257--278.
  • Baldassarre, G. & Mirolli, M., ed.  (2013), Computational and Robotic Models of the Hierarchical Organisation of Behaviour, Springer-Verlag, Berlin.
  • Baldassarre, G. & Mirolli, M., ed.  (2013a), Intrinsically Motivated Learning in Natural and Artificial Systems, Springer-Verlag, Berlin.
  • Baldassarre, G., Gurney, K., Lee, M. and Triesch, J. (2012). Scenario of CLEVER-B4: additional document handed in to the Evaluators and PO.
  • Barto, A.; Mirolli, M. & Baldassarre, G. (in preparation), 'Novelty or suprise?', Frontiers in Cognitive Science.
  • Caligiore, D.; Borghi, A.; Parisi, D.; Ellis, R.; Cangelosi, A. & Baldassarre, G. (2013), 'How affordances associated with a distractor object affect compatibility effects: A study with the computational model TRoPICALS', Psychological Research 77(1), 7-19.
  • Chersi, F.; Mirolli, M.; Pezzulo, G. & Baldassarre, G. (2012), 'A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning.', Neural Networks 41, 212--224.
  • Fiore, V. G.; Sperati, V.; Mannella, F.; Dolan, R. J. & Baldassarre, G. (in preparation), 'Keep focussing: striatal dopamine multiple functions resolved in a single mechanism tested on the iCub', Frontiers in Cognitive Science.
  • Mannella, F.; Gurney, K. & Baldassarre, G. (accepted), 'The nucleus accumbens as a nexus between values and goals in goal-directed behaviour: a review and a new hypothesis', Frontiers in Behavioural Neuroscience.
  • Mannella, F.; Sperati, V.; Shaw, P. H.; Chandrashekhariah, P. N.; Law, J.; Lee, M.; Triesch, J.; Redgrave, P.; Gurney, K.; Mirolli, M. & Baldassarre, G. (in preparation), 'A neurorobotic model of the brain systems underlying intrinsically-motivated cumulative learning', Frontiers in Neurorobotics.
  • Marraffa, R.; Sperati, V.; Caligiore, D.; Triesch, J. & Baldassarre, G. (2012), A bio-inspired attention model of anticipation in gaze-contingency experiments with infants, in Javier Movellan & Matthew Schlesinger, ed., 'IEEE International Conference on Development and Learning-EpiRob 2012 (ICDL-EpiRob-2012)', IEEE, Piscataway, NJ, pp. e1--6.
  • Marraffa, R.; Sperati, V.; Caligiore, D.; Triesch, J. & Baldassarre, G. (2012), A bio-inspired attention model of anticipation in gaze-contingency experiments with infants, in Javier Movellan & Matthew Schlesinger, ed., 'IEEE International Conference on Development and Learning-EpiRob 2012 (ICDL-EpiRob-2012)', IEEE, Piscataway, NJ, pp. E1—6.
  • Mirolli, M. & Baldassarre, G. (2013), Functions and mechanisms of intrinsic motivations: the knowledge versus competence distinction, in Gianluca Baldassarre & Marco Mirolli, ed., 'Intrinsically Motivated Learning in Natural and Artificial Systems', Springer-Verlag, Berlin, pp. 49--72.
  • Mirolli, M.; Baldassarre, G. & Santucci, V. G. (2013), 'Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study', Neural Networks 39, 40-51.
  • Ognibene, D.; Catenacci Volpi, N.; Pezzulo, G. & Baldassare, G. (accepted), Model-free memory-free reinforcement learning and the acquisition of epistemic actions: a bioinspired implementation, in 'Proceedings of Living Machines – The International Conference on Biomimetics and Neurotechnology'.
  • Ognibene, D.; Catenacci Volpi, N.; Pezzulo, G. & Baldassare, G. (accepted), Model-free memory-free reinforcement learning and the acquisition of epistemic actions: a bioinspired implementation, in 'Proceedings of Living Machines – The International Conference on Biomimetics and Neurotechnology'. 
  • Ognibene, D.; Pezzulo, G. & Baldassare, G. (submitted), 'Learning and Development of Representations and Attention for Action: A Neurorobotic Study', IEEE Transactions on Autonomous Mental Development.Baldassarre, G.; Caligiore, D. & Mannella, F. (2013), The hierarchical organisation of cortical and basal-ganglia systems: a computationally-informed review and integrated hypothesis, in Gianluca Baldassarre & Marco Mirolli, ed., 'Computational and Robotic Models of the Hierarchical Organisation of Behaviour', Springer-Verlag, Berlin.
  • Ognibene, D.; Pezzulo, G. & Baldassare, G. (submitted), 'Learning and Development of Representations and Attention for Action: A Neurorobotic Study', IEEE Transactions on Autonomous Mental Development.Santucci, V. G.; Baldassare, G. & Mirolli, M. (2012a), A bio-inspired learning signal for the cumulative learning of different skills, in Stefano Cagnoni; Marco Mirolli & Marco Villani, ed., 'Proceedings of the Italian Workshop on Artificial Life and Evolutionary Computation', pp. 1-12.
  • Santucci, V. G.; Baldassare, G. & Mirolli, M. (2012b), Intrinsic motivation mechanisms for competence acquisition, in 'Proceeding of the IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob 2012)', IEEE, , pp. 1-6.
  • Santucci, V. G.; Baldassarre, G. & Mirolli, M. (in preparation), 'Learning multiple skills through competence-based intrinsic motivations: a computational embodied model', Frontiers in Neurorobotics.
  • Santucci, V. G.; Baldassarre, G. & Mirolli, M. (in press), Cumulative learning through intrinsic reinforcements, in Mirolli M. Villani M. Cagnoni, S., ed., 'Artificial Life, Evolution and Complexity', Springer, Berlin.
  • Santucci, V. G.; Baldassarre, G. & Mirolli, M. (submitted), Intrinsic motivation signals for driving the acquisition of multiple tasks: a simulated robotic study, in Robert West & Terry Stewart, ed., 'Proceedings of the International Conference on Cognitive Modelling'.
  • Thill, S.; Caligiore, D.; Borghi, A. M.; Ziemke, T. & Baldassarre, G. (2013), 'Theories and computational models of affordance and mirror systems: An integrative review', Neuroscience and Biobehavioral Reviews 37, 491-521.