Prudence Pitch

17:15 12/06/2019 | 7newstar.com

Total post : 693

Researchers from Intel’s AI Lab and the Collaborative Robotics make a 3D humanoid agent walk upright with OpenAI’s Humanoid benchmark

(Tech) Researchers from Intel’s AI Lab and the Collaborative Robotics and Intelligent Systems Institute at Oregon State University have combined a number of methods to make better-performing reinforcement learning systems that can be applied to things like robotic control, systems governing autonomous vehicle function, and other complex AI tasks.

 

 

Collaborative Evolutionary Reinforcement Learning (CERL) can achieve better performance in benchmarks like Humanoid, OpenAI’s Hopper, Swimmer, HalfCheetah, and Walker2D than gradient-based or evolutionary algorithms for reinforcement learning can on their own. Using the CERL approach, researchers were able to make a 3D humanoid agent walk upright with OpenAI’s Humanoid benchmark.

Those results are achieved in part through training systems that explore more of a reinforcement learning training environment to seek a reward and complete a specific task.

Environment exploration is important to ensure that a diverse range of experiences are documented and courses of action considered. Issues related to environmental exploration have emerged, particularly with the rise in popularity of using deep reinforcement learning to accomplish challenging real-world tasks, researchers said in a paper explaining how CERL works.

CERL combines policy gradient-based reinforcement learning and evolutionary algorithms, and then the top-performing neural nets are chosen in each batch or generation of trained systems. That way, researchers can use the strongest neural nets to create new generations of systems, and they can distribute compute resources to algorithms that achieve the best performance.

CERL also combines replay buffers, which store the experience of learners in an environment, in order to create a single replay buffer and share experiences between systems in order to achieve higher sample efficiency than the previous method.

Comment


Post new