Skip to content. | Skip to navigation

Personal tools
Document Actions

Hung Ngo, Matthew Luciw, Ngo Anh Vien, and Juergen Schmidhuber (2013)

Upper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback

In: International Joint Conference on Artificial Intelligence (IJCAI 2013).

We introduce a novel algorithm called Upper Confi- dence Weighted Learning (UCWL) for online mul- ticlass learning from binary feedback. UCWL com- bines the Upper Confidence Bound (UCB) frame- work with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and non- separable data) with low computational costs. Es- timated confidence intervals are used for informed exploration, which enables faster learning than the uninformed exploration case or the case where ex- ploration is not used. The targeted application set- ting is human-robot interaction (HRI), in which a robot is learning to classify its observations while a human teaches it by providing only binary feedback (e.g., right/wrong). Results in an HRI experiment, and with two benchmark datasets, show UCWL outperforms other algorithms in the online binary feedback setting, and surprisingly even sometimes beats state-of-the-art algorithms that get full feed- back, while UCWL gets only binary feedback on the same data.
Intrinsic Exploration; Animals; Motives; Forced Exploration; Free Exploration; Novelty.