Documentation of demo CLEVER-B2, due in July 2011 (Review Meeting)
This page documents the CLEVER-B2 bio-constrained robot demonstrator shown at the second IM-CLeVeR Review Meeting (Lugano, 10-12/07/2011). The demonstrator reproduces and investigates the board experiment run with capuchin monkeys (run by CNR-ISTC-UCP; see below) and children (run by UCBM; see below). The model was tested tested on the basis of the same mechatronic board used with the real experiment participants, and using the robot iCub. The goal of the model is to furnish an operational hypothesis on the brain mechanisms that might underlie the acquisition and exploitation of actions driven by intrinsic motivations mechanisms, observed in the monkeys and children of the board experiment.
The board experiment modeled here is divided in two phases, based on the mechatronic board. The board has three buttons and three boxes that can be opened by an automatic device. In the first learning phase, the participants can press any button of the board. The pressure of a button causes the opening of one of three boxes corresponding to it. This causes a surprising, unexpected event that should drive the learning of the action that caused it. Alsos it should drive the learning of the action-outcome relation between the action executed and its effect (opening of the box). In the second test phase of the experiment, one particular outcome (e.g., box 1 opening) is given high value (e.g., by putting food into box 1: this has a transparent door closing it, so food is directly visible), so the participant should be able to immediately recall the action that can deliver such outcome thanks to the action-outcome relation learned in the learning phase.
The experiments with monkeys and children show that when they face the experiment with the board they are already endowed with a well-developped repertoire of actions acquired during life before the experiment, for example for looking at salient parts of the board and for manipulating the foveated objects. On this basis, the model was divided in two main components:
To best coordinate the two components at the technical level, CNR-ISTC-LOCEN and AU designed a specific API that interfaces the DMc with the SMc, and interfaces SMc with the Yarp-based simulator of the simulated and real iCub robot. This API has been very important for the success of the interaction between the Teams involved in the design and implementation of CLEVER-B2. Below we show the technical structure of the whole system.
Diagram showing the YARP interface between the SMc and DMc and other systems
Architecture of the Decision Making component.
The decision making component is based on important architectural and functioning principles on brain organisation drawn from the neuroscientific literature and summarised in the Figure above:
During the second test phase of the experiment, outcome representations inside the goal loop are activated one by one for some time by sending to them a strong activation. When this is done the cortico-cortical connections from the goal loop to the eye and arm loops allow the recall and execution of the correct eye-actions and arm-actions associated to the outcomes.
The figure below shows the actions selected by the model during the learning phase and during the test phase. The results show that during the learning phase the model first explores randomly, then focusses on a specific eye target (buttons) and arm action (pressing) for a prolongued period in order to fully learn the information about one box; then passes to another button and box, etc., until after learning everything it starts to randomly explore again.
The behaviour of the DMc. x-axis: time bins of 6 minutes each; the first 17 bins refer to the learning phase whereas the last three bins to the test phase. y-axis: number of actions involving the buttons selected by the model (actions related to buttons are not reported for simplicity). Notice how during training the system focusses on each button (and related box opening) for a prolongued time. Also notice how during the test phase the model is highly succesfull in selecting the correct actions corresponding to the activated goal.
Below we show the videos of the simulated and the real robots. Note that the videos are based on the low-level actions implemented by the SMc: this is explained in detail below.
|This video shows the simulated robot in the learning and in the test phase. The video lasts about 10 minutes and explaines in detail the initial random exploration, the following focussing phases, and the final test phase. It also shows the internal activations of the neural model.|
|This video shows the real robot behaviour during a test phase (the model uses the connection weights learned in simulation). Note how the robot performs the eye and arm actions corresponding to the activated outcomes or goals (the three goals are activated one after the other for few seconds).|
The sensorimotor component of CLEVER-B aims to furnish the robot with the sensor and motor abilities required to interact with the experimental board. As mentioned above, these include directing gaze, pressing the foveated objects, and two dummy actions. Once an action is selected by the DMc it is carried out by the SMc. Control of the sensorimotor systems is learnt following the developmental approach of AU. As such it:
For the experiments reported here, the robot is required to learn to press the objects on the board (or to perform dummy actions). This requires the robot to learn (1) gaze control; (2) arm control (reaching to press); (3) integration of gaze and reaching. In the developmental framework, these skills are learnt cumulatively, using an egocentric ‘gaze space’ to coordinate visual- and reach-spaces. To implement hand-eye coordination following the developmental trajectory, eye control in the visual and proprioceptive spaces is the first competency to be learnt. Head/neck control follows, using the eye as an intermediary to learn the mapping between visual and proprioceptive space in the neck. Arm control is subsequently learnt in a proprioceptive space, using a common gaze space to link to the visual space of the eye (Huelse et al. 2010).
We have implemented an architecture for learning gaze control which supports both eye and head movements. This enables the robot to attend to visual targets on the board, currently identified using simple vision processes (more advanced visual abstraction techniques will be employed in future demonstrators). Without vergence control, arm movements are currently learnt in a 2-dimensional plane over the board. This enables reaching to the objects on the board without the need for recognising depth in the image. Vergence and 3-dimensional reaching will be investigated in year 3.
AU is using the SMc also to work with USFD to model the joystick task used by this Team to investigate action learning in humans. The typical experiment with the joystick task requires the participants to perform motor babbling to find rewarding targets on the board without these targets being visible. Currently this drives learning of reaching movements to point to the buttons on the board as a reward is associated with these ('where task'). We plan to extend this work to solve ‘what’ tasks (the reward arrives with a particular action, e.g. a particular gesture on buttons). Below we show various videos that demonstrate the acquisition of the actions of the SMc and also how these actions can be assembled to solve the board task above and also the joystick task.
|This video shows the iCub learning saccade control using motor babbling and the AU mapping framework. The video shows the iCub learning from scratch.|
|This video shows the iCub learning head control, and follows directly from a period of eye learning initiated in the video above. At the beginning of this video a constraint is lifted enabling the robot to progress from eye learning to head learning. The development of head control is much faster than that for eye control as it uses the already well-defined mapping for eye saccades.|
|In this video the iCub is beginning to learn reaching movements. This follows directly from the eye and head development shown above. The robot is mapping the position of joints in the arm to the direction of the gaze (from both eye and head systems).|
|In this video the iCub, having learnt visually-guided reaching, is placed before the board. It performs exploration of the board using the joystick model of USFD to find and reinforce the position of the blue button. In this abstract example, the reward is triggered by the robot seeing the hand over the button, although it could be extended to include pressing actions and other stimili to learn 'what' the button is. The middle section of this video is sped up to show the whole learning cycle.|
|In this video the iCub, having learnt visually-guided reaching and explored the board, is demonstrating the fully functioning sensorimotor control. It is driven through the CLEVER-B stub using a simulation of the decision making component that is manually directing the iCub's attention to the points of interest on the board, where the button pressing is linked to lights activating.
|Alternate view of the sensorimotor component demonstration|
In the following video the experimental protocol with children is presented. The experimental protocol is divided in two phases: a training phase and a test phase. The basic goal of the protocol is to assess whether a child can use a motor skill that he/she has acquired during the training phase (push a button in a way that opens a box) to retrieve a reward in the test phase. The board is programmed as in figure: with a direct association on left and a crossed relatioship on the right. Subjects are divided in two groups: the Experimental Group (EXP) and the Control Group (CTRL). The protocols for the two groups differ only in the Training Phase: while in the Experimental Group the rewarded action causes the opening of the associated box also in the training phase phase, in the Control group the boxes do not open in the training phase. All the other audio-visual stimuli are set in the same way in both groups.
These preliminary trials on 12 subjects aged between 24 and 68 months have highlighted:
|The infant experiment protocol|