2011 - 2012 : A large scale robotically embodied decision making model (PEPII)
 
2012 - 2015 : MaRDi (ANR ContInt)
2011 - 2012 : A large scale robotically embodied decision making model (PEPII)
2011 - 2013 : METHODEO (ANR CSOSG)
2008 - 2010 : Adaptation et Action (Programme Interdisciplinaire CNRS - Neuroinformatique)
2007 - 2009 : Medi@tic (System@tic and CapDigital Poles)
 
 
 
> MALIS Home > Projects > French National Projects > 2011 - 2012 : A large scale robotically embodied decision making model (PEPII)

Information, Multimodalité & Signal

2011 - 2012 : A large scale robotically embodied decision making model (PEPII)
 
 

This project aimed at adapting a model of the cortico-basal ganglia loops, developed by the group of T. Boraud, to realist reinforcement learning tasks. Three partners were involved :

- BG3, IMN UMR 5293 INSB, Bordeaux (T. Boraud (resp), M. Guthrie, A. Garenne)
- Cortex, LORIA UMR 7503 INS2I, Nancy (F. Alexandre, N. Rougier, T. Viéville)
- IMS, Supélec, Metz (H. Frezza-Buet, J. Fix)

Below, we describe a robotic setup which aims at studying the abilities of the basal ganglia loop model. The setup is depicted below.

The robot has two needs to fulfill : it may be hungry and/or thirsty. To satisfy his hunger, he must consume the red spots. To satisfy his thirst, he must consume the blue spots. Initially, the robot has no idea on which spot satisfy which need, this is a mapping the he must learn.

We built an artificial hardwired perceptive system. It has some neurons detecting whether :
- a stimulus is present or not in the visual field
- a stimulus is centered or not within the visual field
- a stimulus is near or not to the robot

It is also equipped with a basic motor system which allows the robot :
- to search for a stimulus, ie randomly wandering around
- to move forward
- to center a selected stimulus present in the visual field

However the mapping between the perceptions and the actions is unknown to the robot and must be learned. Learning is biased by the delivery of rewards when the robot consumes the red spots while being hungry and the blue spots while being thirsty. This is what makes the task a reinforcement learning task. The adaptive controller used during the experiment is depicted below.

Below we show two videos of the robot performing the task. In the first video, the robot has not yet learned an optimal controller and we see that it selected inappropriate actions. On the second video, the behavior is much more fluent, indeed optimal : the sequence of actions that drive the robot toward the reward as fast as possible was correctly learned.