Reinforcement learning And POMDP
- by Betamoo
I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process..
I thought inputs to the NN would be: current state, selected action, result state;
The output is a probability in [0,1] (prob. that performing selected action on current state will lead to result state)
In training, I fed the inputs stated before, into the NN, and I taught it the output=1.0 for each case that already occurred.
The problem :
For nearly all test case the output probability is near 0.95.. no output was under 0.9 !
Even for nearly impossible results, it gave that high prob.
PS:I think this is because I taught it happened cases only, but not un-happened ones..
But I can not at each step in the episode teach it the output=0.0 for every un-happened action!
Any suggestions how to over come this problem? Or may be another way to use NN or to implement prob function?
Thanks