This robot (iCub) taught itself to hit the center of the target in eight tries after being given some basic instructions. This is the latest demonstration of a new type of learning algorithm:
… called ARCHER (Augmented Reward Chained Regression) [this] algorithm, was developed and optimized specifically for problems like the archery training, which have a smooth solution space and prior knowledge about the goal to be achieved. In the case of archery, we know that hitting the center corresponds to the maximum reward we can get. Using this prior information about the task, we can view the position of the arrow’s tip as an augmented reward. ARCHER uses a chained local regression process that iteratively estimates new policy parameters which have a greater probability of leading to the achievement of the goal of the task, based on the experience so far. An advantage of ARCHER over other learning algorithms is that it makes use of richer feedback information about the result of a rollout.
Now can we train one to do dishes or laundry? And we don’t even mind losing a few dishes in the process.