Skip to content. | Skip to navigation

Personal tools
Document Actions

CLEVER-B2

Documentation of demo CLEVER-B2, due in July 2011 (Review Meeting)

Introduction

This page documents the CLEVER-B2 bio-constrained robot demonstrator shown at the second IM-CLeVeR Review Meeting (Lugano, 10-12/07/2011). The demonstrator reproduces and investigates the board experiment run with capuchin monkeys (run by CNR-ISTC-UCP; see below) and children (run by UCBM; see below). The model was tested tested on the basis of the same mechatronic board used with the real experiment participants, and using the robot iCub. The goal of the model is to furnish an operational hypothesis on the brain mechanisms that might underlie the acquisition and exploitation of actions driven by intrinsic motivations mechanisms, observed in the monkeys and children of the board experiment.

The target experiment

The board experiment modeled here is divided in two phases, based on the mechatronic board. The board has three buttons and three boxes that can be opened by an automatic device. In the first learning phase, the participants can press any button of the board. The pressure of a button causes the opening of one of three boxes corresponding to it. This causes a surprising, unexpected event that should drive the learning of the action that caused it. Alsos it should drive the learning of the action-outcome relation between the action executed and its effect (opening of the box). In the second test phase of the experiment, one particular outcome (e.g., box 1 opening) is given high value (e.g., by putting food into box 1: this has a transparent door closing it, so food is directly visible), so the participant should be able to immediately recall the action that can deliver such outcome thanks to the action-outcome relation learned in the learning phase.

The two components of the model

The experiments with monkeys and children show that when they face the experiment with the board they are already endowed with a well-developped repertoire of actions acquired during life before the experiment, for example for looking at salient parts of the board and for manipulating the foveated objects. On this basis, the model was divided in two main components:

  • A ''decision making'' component (DMc), mainly developped by CNR-ISTC-LOCEN and USFD: this component is in charge of deciding the actions to perform based on mechanisms putatively implemented by striato-cortical loops of brain.
  • A ''sensorimotor component'' (SMc), under the responsibility of AU, in charge of deciding the actions to be performed based on mechanisms putatively implemented in the cortical neural pathways.

To best coordinate the two components at the technical level, CNR-ISTC-LOCEN and AU designed a specific API that interfaces the DMc with the SMc, and interfaces SMc with the Yarp-based simulator of the simulated and real iCub robot. This API has been very important for the success of the interaction between the Teams involved in the design and implementation of CLEVER-B2. Below we show the technical structure of the whole system.

DMc SMc interface

 Diagram showing the YARP interface between the SMc and DMc and other systems

Decision Making component (DMc)

 

Locen_CLEVERB2_simplified_functional_architecture

Architecture of the Decision Making component.

The decision making component is based on important architectural and functioning principles on brain organisation drawn from the neuroscientific literature and summarised in the Figure above:

  • The principle of the basal ganglia-cortical organisation of brain. The idea is that there are different basal-ganglia partially-segregated loops selecting: (a) goals (i.e., desire outcomes); (b) eye actions, that is gaze direction (in the model there are 6 possible actions, that direct gaze to either one of the 3 buttons and 3 boxes); (c)  arm actions (in the model: press the foveated object; dummy action 1; dummy action 2; the dummy actions were introduced to test the learning capabilities of the model).
  • The principle of the prefrontal cortex control: a dialogue with sub-cortical value systems such as the amygdala allows the goal loop, involving the prefrontal cortex, to select the outcome to pursue (this is relevant during the test phase).
  • The principle of the repetition bias that allows the model, based on intrinsic motivations, to temporarily focus attention, and to repeat the same arm action, on a particular portion of the environment. This process is based on dopamine, produced by the sudden opening of a box: this drives the formation of the input connection weights to the loop on the basis of a three-element Hebbian rule. This repetition is however transient as the dopamine signal habituates with repeated experiences of a certain outcome as it becomes less novel and suprising.This causes the learned connection weights to fade away.
  • The principles for which during the learning phase cortico-cortical connections can be formed between the outcomes represented in the goal loop (e.g., box 1 opens) and the representations of the selected actions in the eye loop (e.g., look at button 1) and in the arm loop (e.g., press button 1). This learning is based on a Hebbian learning rule.

During the second test phase of the experiment, outcome representations inside the goal loop are activated one by one for some time by sending to them a strong activation. When this is done the cortico-cortical connections from the goal loop to the eye and arm loops allow the recall and execution of the correct eye-actions and arm-actions associated to the outcomes.

Results of the tests with the DMc

The figure below shows the actions selected by the model during the learning phase and during the test phase. The results show that during the learning phase the model first explores randomly, then focusses on a specific eye target (buttons) and arm action (pressing) for a prolongued period in order to fully learn the information about one box; then passes to another button and box, etc., until after learning everything it starts to randomly explore again.

Graph_action_stats The behaviour of the DMc. x-axis: time bins of 6 minutes each; the first 17 bins refer to the learning phase whereas the last three bins to the test phase. y-axis: number of actions involving the buttons selected by the model  (actions related to buttons are not reported for simplicity). Notice how during training the system focusses on each button (and related box opening) for a prolongued time. Also notice how during the test phase the model is highly succesfull in selecting the correct actions corresponding to the activated goal.

Videos on the functioning of the DMc

Below we show the videos of the simulated and the real robots. Note that the videos are based on the low-level actions implemented by the SMc: this is explained in detail below.

This video shows the simulated robot in the learning and in the test phase. The video lasts about 10 minutes and explaines in detail the initial random exploration, the following focussing phases, and the final test phase. It also shows the internal activations of the neural model.

 

This video shows the real robot behaviour during a test phase (the model uses the connection weights learned in simulation). Note how the robot performs the eye and arm actions corresponding to the activated outcomes or goals (the three goals are activated one after the other for few seconds).

Sensorimotor component (SMc)

The sensorimotor component of CLEVER-B aims to furnish the robot with the sensor and motor abilities required to interact with the experimental board. As mentioned above, these include directing gaze, pressing the foveated objects, and two dummy actions. Once an action is selected by the DMc it is carried out by the SMc. Control of the sensorimotor systems is learnt following the developmental approach of AU. As such it:

  • Reflects the developmental stages evident in infancy. The sensor and motor competencies are built in a cumulative manner using constraints to scaffold development. Complex skills are built upon primitive skills following the timing of similar developments in infancy.
  • Uses a content-neutral structure of maps and mappings to represent sensor and motor spaces. Maps are populated with fields and reflect neural structures in the cortex. Mappings link equivalent fields on associated maps, and enable the gradual development of sensorimotor control.
  • Utilises motor babbling to explore space and learn mappings. Motor babbling is a key driver in learning. We are implementing systems from USFD to enhance motor babbling performed by the robot.

For the experiments reported here, the robot is required to learn to press the objects on the board (or to perform dummy actions). This requires the robot to learn (1) gaze control; (2) arm control (reaching to press); (3) integration of gaze and reaching. In the developmental framework, these skills are learnt cumulatively, using an egocentric ‘gaze space’ to coordinate visual- and reach-spaces. To implement hand-eye coordination following the developmental trajectory, eye control in the visual and proprioceptive spaces is the first competency to be learnt. Head/neck control follows, using the eye as an intermediary to learn the mapping between visual and proprioceptive space in the neck. Arm control is subsequently learnt in a proprioceptive space, using a common gaze space to link to the visual space of the eye (Huelse et al. 2010).

We have implemented an architecture for learning gaze control which supports both eye and head movements. This enables the robot to attend to visual targets on the board, currently identified using simple vision processes (more advanced visual abstraction techniques will be employed in future demonstrators). Without vergence control, arm movements are currently learnt in a 2-dimensional plane over the board. This enables reaching to the objects on the board without the need for recognising depth in the image. Vergence and 3-dimensional reaching will be investigated in year 3.

Videos on the functioning of the SMc

AU is using the SMc also to work with USFD to model the joystick task used by this Team to investigate action learning in humans. The typical experiment with the joystick task requires the participants to perform motor babbling to find rewarding targets on the board without these targets being visible. Currently this drives learning of reaching movements to point to the buttons on the board as a reward is associated with these ('where task'). We plan to extend this work to solve ‘what’ tasks (the reward arrives with a particular action, e.g. a particular gesture on buttons). Below we show various videos that demonstrate the acquisition of the actions of the SMc and also how these actions can be assembled to solve the board task above and also the joystick task. 

 

This video shows the iCub learning saccade control using motor babbling and the AU mapping framework.  The video shows the iCub learning from scratch.
This video shows the iCub learning head control, and follows directly from a period of eye learning initiated in the video above.  At the beginning of this video a constraint is lifted enabling the robot to progress from eye learning to head learning.  The development of head control is much faster than that for eye control as it uses the already well-defined mapping for eye saccades.
In this video the iCub is beginning to learn reaching movements.  This follows directly from the eye and head development shown above.  The robot is mapping the position of joints in the arm to the direction of the gaze (from both eye and head systems).
In this video the iCub, having learnt visually-guided reaching, is placed before the board. It performs exploration of the board using the joystick model of USFD to find and reinforce the position of the blue button.  In this abstract example, the reward is triggered by the robot seeing the hand over the button, although it could be extended to include pressing actions and other stimili to learn 'what' the button is.  The middle section of this video is sped up to show the whole learning cycle.
In this video the iCub, having learnt visually-guided reaching and explored the board, is demonstrating the fully functioning sensorimotor control. It is driven through the CLEVER-B stub using a simulation of the decision making component that is manually directing the iCub's attention to the points of interest on the board, where the button pressing is linked to lights activating.
Alternate view of the sensorimotor component demonstration

 

Experiments with infants

In the following video the experimental protocol with children is presented.  The experimental protocol is divided in two phases: a training phase and a test phase. The basic goal of the protocol is to assess whether a child can use a motor skill that he/she has acquired during the training phase (push a button in a way that opens a box) to retrieve a reward in the test phase. The board is programmed as in figure: with a direct association on left and a crossed relatioship on the right. Subjects are divided in two groups: the Experimental Group (EXP) and the Control Group (CTRL). The protocols for the two groups differ only in the Training Phase: while in the Experimental Group the rewarded action causes the opening of the associated box also in the training phase phase, in the Control group the boxes do not open in the training phase. All the other audio-visual stimuli are set in the same way in both groups.
Children experimental protocol
These preliminary trials on 12 subjects aged between 24 and 68 months have highlighted:

  1. The workspace seems to be important as regard the way in which children explore the environment; 
  2. Children who were given the chance of discover a new skill are more likely to use this skill later;
  3. neither the EXP nor the CTRL group did learn more complex spatial relationships.
These results have suggested us to focus our investigation on children aged between 36 and 48 months of age. To verify how much space affects the modality of children exploration we decided to invert the crossed relation: half subjects will be tested with the crossed relation on right while the other half with the crossed relation on left. The control group will be yoked.

 

The infant experiment protocol