Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> The exploration problem can largely be bypassed in Montezuma’s Revenge by starting each RL episode by resetting from a state in a demonstration. By starting from demonstration states, the agent needs to perform much less exploration to learn to play the game compared to when it starts from the beginning of the game at every episode. Doing so enables us to disentangle exploration and learning.

Or in other words- use the Domain Knowledge, Luke. Quit trying to learn everything from scratch. Because that's just dumb.



Interesting though that they seem to be using the demonstration only for initial states and not for action choices. It's like using an example of solving a maze just to get a bunch of places to start exploring from, but not to actually try and copy someone's "turn right at every corner" strategy. The use of domain knowledge is actually pretty limited in that sense..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: