Skip to content. | Skip to navigation

Personal tools
Document Actions


Documentation of demo CLEVER-K2, due in July 2011 (Review Meeting)

We successfully demonstrated the goal of the K2 demonstrator, learning a repertoire of actions through intrinsic motivation in the iCub simulator. We were also able to demonstrate the successful integration of all modules to learn a limited set of actions based on intrinsic motivation on the iCub robot. The experiments reported here focused on the successful integration of the methods and models developed through various efforts, not on the complexity of the learned skills. During the next stages of the project we will further develop both the robot’s action primitives (e.g., ‘grasp’) and the learning of composite skills with hierarchical learning methods.


Approach and methods:

Based on the K1 blueprint for hierarchical learning (Ring et al., 2010), we developed the K2 functional architecture (Pape et al., 2010) that specifies how the different approaches and methods developed in WP4-6 should interact in order to learn a repertoire of actions through intrinsic motivation. Within this framework, three main components are distinguished:

(1) an adaptive predictor/compressor that tries to compress and predict the robot’s observations,

(2) an action generator that selects actions that (after filtering by the kinematics command filter) will be executed by the robot and

(3) a novelty detector that keeps track of the improvement of the predictor module, and generates the intrinsic reward signal for the action generator. The specification of the interaction between these components facilitates the replacement and testing of different modules developed by the IM-CLeVeR partners.

 K2 diagram

To prevent the iCub from damaging itself, all actions sent to the real robot are filtered through the kinematics command filter in Virtual Skin.



[IDSIA iCub Sim] The first set of experiments for learning a repertoire of actions was performed in the iCub simulator. Initially, the vision modules were replaced with simple queries to the simulator environment for the locations and orientations of objects.

iCub sim setup result sim

Figure 1: learning a repertoire of skills based on intrinsic motivation in the iCub simulator. 


Figure 1 shows the progress of learning an action repertoire over 200 episodes of max 20 steps. As becomes clear from this figure, the robot focuses on learning the skill with the highest learning process, and switches back to exploration mode when it cannot improve its prediction error anymore for any of the skills. After about 100 episodes, the robot successfully learned different policies for toppling and moving the objects on the table, based on intrinsic motivation. Note that selecting the skill with the highest prediction error (instead of the decrease in prediction error) would not yield the desired learning of skills, as the prediction error for hitting the ball remains much larger than that of toppling the stick. In later stages of the project, the learned prediction and associated action modules could be used to learn a hierarchy of composed planning and action modules. The code used for the above results is made available to the project partners in the software repository, and serves as an example of a basic implementation of the different components specified in the K2 framework.


Right: simulation setup. Sensory input to the robot consists of horizontal ball coordinates, vertical orientation of the stick, and discretized pose of the two active joints. Prediction targets are vertical stick orientation and the two horizontal ball coordinates. The robot starts from a randomly selected pose, and can move two right-arm joints, at a random (unknown to the robot) velocity. This leads to random displacement of the ball when the robot hits it, but deterministic toppling of the stick.

Left: various training statistics. The prediction errors decrease, then start to fluctuate around small values (because the predictors are limited), the learning progress decreases to 0, and the robot switches to random exploration when no learning progress can be obtained for the skills. Because of randomness in velocity, prediction improvement is occasionally obtained when hitting the ball, leading to improvement of the associated skill. Note that the learned skills remain successful (hitting the ball or toppling the stick) after the predictors cease to improve.



[IDSIA iCub Robot] Given the successful results in simulation, we started with experiments on the iCub robot. As vision module, we used AutoIncSFA, which builds several abstract features of the robot's interaction with the environment (Figure 2).

SFA setup 


sfa toppling

Figure 2: The two slowest AutoIncSFA features as a function of episode time (top) show a step change when toppling the cup and bottle, respectively (bottom).

Learning is carried out in an episodic manner. Initially, the robot performs random movements in the limited joint space of the right arm, while using the camera images of the right eye as input to the AuotIncSFA. Once the SFA features converge, the robot starts to learn the prediction of the slowest feature (corresponding to toppling the blue cup) with RL. Again, the reward is the decrease in prediction error to allow the robot to ignore tasks that are (currently) too difficult to learn. As expected, based on the decrease in prediction error of the linear predictor (the intrinsic reward), the robot learns a policy for toppling the blue cup, which is stored in a Q-table for later use. Currently, we are working on scenarios with more objects, more detailed input from the computer vision algorithms (Figure 2) and more skills that can be learned and stored in the action repertoire.


[FIAS iCub Robot] Curious vision system for autonomous object learning: We present a “curious” active vision system that autonomously explores its environment and learns object representations without any human assistance. Similar to an infant, who is intrinsically motivated to seek out new information, our system is endowed with an attention and learning mechanism designed to search for new information that has not been learned yet. Our method can deal with dynamic changes of object appearance which are incorporated into the object models.


fias saliency map

Figure 3: Several objects are placed in front of the robot at different locations, depths and partial occlusions that are viewed from left and right cameras of iCub (a). Harris corner points are detected and matched across left and right image using Gabor-jets (b). A low resolution saliency map is used to select the most salient interest points in the scene (c). Yellow circle represents the most salient point, green circle its corresponding location on the other image. Interest points on left image are clustered based on their location and stereo disparity (d). Each cluster represents an object in the scene.  Spurious clusters with very few features are removed. Attention shifts to the most interesting object that is segmented out from the scene (e). Features on the object are sent to the learning module. When the objects become familiar to the robot they will be inhibited for further selection by removing the corresponding interest points. Color blobs in (f) indicate recognized objects whose interest points have been removed.

The video shows a "curious" active vision system that autonomously explores its environment and learns object representations without any human assistance.
As it can be seen in the video, the system's attention is drawn towards those locations in the scene that provide the highest potential for learning (new features on an object represented by red/pink dots). Specifically, the system pays attention to salient image regions likely to contain objects, it continues looking at objects and updating their models as long as it can learn something new about them, and it avoids looking at objects whose models are already accurate. As you see in the video if an object is new the system assigns a new ID to the object and if the object reappears the system announces its object ID while simultaneously updating its model.


[UU - Novelty Detection] Novelty detection based on habituable neural networks. At UU, for CLEVER-K2, we focussed on developing novelty detection algorithms and architectures that lead a robot to focus attention on those parts of the environment that appear particularly interesting. We successfully researched and developed a novel perceptual learner that is able to operate online, cumulatively and in an unsupervised manner, and it is intrinsically motivated by novelty detection based on habituation. We then demonstrated this system on two different robot platforms, our PR2 and the iCub. This system included a Bag-of-Words (BoW) with habituated neural network classifier, an online expandable bag of words and an intrinsically motivated learner which was based on novelty detection and habituation. More details of this system can be seen in (Gatsoulis et al., 2011). This work also included collaborations with other groups in particular IDSIA, FIAS and AU.

[UU - Novelty Detection on PR2] Firstly, we tested our system on our PR2 robot platform (Personal robot 2 from Willow Garage in California). Figure 1 below shows a block diagram of the operational procedure that was followed by the robot. Figure 2 shows the experimental setup of the PR2 with unknown objects on the table in front of it. Figure 3 shows how the robot performs the first stage of the experiment where it  segments the objects from the table top, identifies  already known objects and unknown objects have a higher novelty value. Already known objects will initially have a lower novelty value which increase with time as habituation decreases. Figure 4 shows how the robot lifts the object and manipulates it to find more information from it by extracting more SURF features.



Below shows a video of the experiment on the PR2:

In this video our PR2 learns the most novel object in its view by learning its features by picking up and inspecting the object.


From the results of the experiment we found that we managed to achieve 100% on object recognition and 100% for the robot correctly identifying the most novel object. the graph in Figure 5 below shows how the expected behaviour of the system was met, in that in every case the robot focussed on the object that was sufficiently novel, and as such demonstrates the effective use of the biologically inspired novelty detection based on habituation as an intrinsic motivation that drives the learning of an intelligent robot system.


 [UU - Novelty Detection on the iCub] Secondly, to show that the system could work on other robotic platforms and be useful in IM-CLeVeR, we worked very closely together with AU, including a visit as part of the collaboration, to test the system on the iCub. This was successfully achieved and is shown in the video below.

This video demonstrates the iCub robot learning the most novel object in its field of view. This video was produced through a collaboration of the UU and AU.


 [UU - Hierarchy of Actions] At UU, we also worked on a robot building a hierarchy of skills. The robot uses a biologically inspired algorithm to simultaneously combine, adapt and create actions to solve a task. The autonomously created actions can be combined together in a hierarchical fashion. This approach relies on skills  with which the robot is already provided, like grasping or motion planning. Therefore software reuse is an important advantage of our proposed approach. The video below shows the PR2 that has learned how to combine two objects by stacking them. The resulting action is generalized to  putting items in a bag.

In this video our PR2 manages to place the bottle into the bag by using a composition of previously learnt skills in the correct sequence which has been generated autonomously.



Selected bibliography:

M. Frank, A. Förster, J. Schmidhuber. Virtual Skin - Infrastructure for Curious Robots, 2011.

V. R. Kompella, M. Luciw, J. Schmidhuber. Incremental slow feature analysis. IJCAI-2011, Barcelona, 2011.

L. Pape, M. Ring, F. Gomez, K. Frank, A. Förster, D. Migliore and J. Schmidhuber, (2010), IM‐CLeVeR‐K2 functional architecture, internal report of the EU‐funded Integrated Project “IM‐CLeVeR – Intrinsically Motivated Cumulative Learning Versatile Robots”.

M. Ring, A. Förster, K. Frank, D. Migliore, C. Dimitrakakis, Y. Gatsoulis, J. Triesch, J. Condell, J. Schmidhuber, (2010), CLEVER-K1: The Blueprint of the IM-CleVeR Architecture for Machine Learning, Deliverable D7. 4 of the EU-funded Project “IM-CLeVeR – Intrinsically Motivated Cumulative Learning Versatile Robots”.

Y. Gatsoulis, C. Burbridge and M. McGinnity, (2011). Online Unsupervised Cumulative Learning for Life-Long Robot Operation. ROBIO'2011, Phuket, 2011.