Reinforcement learning And POMDP

Posted by Betamoo on Stack Overflow See other posts from Stack Overflow or by Betamoo
Published on 2010-05-01T16:04:39Z Indexed on 2010/05/01 16:07 UTC
Read the original article Hit count: 424

Filed under:

machine-learning

|

markov-models

|

reinforcement-learning

|

probability

|

neural-network

I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process..
I thought inputs to the NN would be: current state, selected action, result state; The output is a probability in [0,1] (prob. that performing selected action on current state will lead to result state)
In training, I fed the inputs stated before, into the NN, and I taught it the output=1.0 for each case that already occurred.

The problem :
For nearly all test case the output probability is near 0.95.. no output was under 0.9 ! Even for nearly impossible results, it gave that high prob.

PS:I think this is because I taught it happened cases only, but not un-happened ones.. But I can not at each step in the episode teach it the output=0.0 for every un-happened action!

Any suggestions how to over come this problem? Or may be another way to use NN or to implement prob function?

Thanks

© Stack Overflow or respective owner

Related posts about machine-learning

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in… >>> More
Is it possible to predict future using machine learning and/or AI?

as seen on Programmers - Search for 'Programmers'
Recently I have started reading about machine learning. From 3000 feet view, machine learning seems really great thing but as if now I have found that machine learning is limited to only 3 types of algorithms namely classification, clustering and recommendations. I would like to know if my assumption… >>> More
Design for a machine learning artificial intelligence framework

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More
A good machine learning technique to weed out good URLs from bad

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have an application that needs to discriminate between good HTTP GET requests and bad. For example: http://somesite.com?passes=dodgy+parameter # BAD http://anothersite.com?passes=a+good+parameter # GOOD My system can make a binary decision about whether or not a… >>> More
Design for a machine learning artificial intelligence framework (community wiki)

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More

Related posts about markov-models

Using Markov models to convert all caps to mixed case and related problems

as seen on Stack Overflow - Search for 'Stack Overflow'
I've been thinking about using Markov techniques to restore missing information to natural language text. Restore mixed case to text in all caps Restore accents / diacritics to languages which should have them but have been converted to plain ASCII Convert rough phonetic transcriptions back into… >>> More
Reinforcement learning toy project

as seen on Stack Overflow - Search for 'Stack Overflow'
My toy project to learn & apply Reinforcement Learning is: - An agent tries to reach a goal state "safely" & "quickly".... - But there are projectiles and rockets that are launched upon the agent in the way. - The agent can determine rockets position -with some noise- only if they are "near" -… >>> More
Update Rule in Temporal difference

as seen on Stack Overflow - Search for 'Stack Overflow'
The update rule TD(0) Q-Learning: Q(t-1) = (1-alpha) * Q(t-1) + (alpha) * (Reward(t-1) + gamma* Max( Q(t) ) ) Then take either the current best action (to optimize) or a random action (to explorer) Where MaxNextQ is the maximum Q that can be got in the next state... But in TD(1) I think update… >>> More
Reinforcement learning And POMDP

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process.. I thought inputs to the NN would be: current state, selected action, result state; The output is a probability in [0,1] (prob. that performing selected action on current state will lead to… >>> More
HMM for perspective estimation in document image, can't understand the algorithm

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello! Here is a paper, it is about estimating the perspective of binary image containing text and some noise or non text objects. PDF document The algorithm uses the Hidden Markov Model: actually two conditions T - text B - backgrouond (i.e. noise) It is hard to understand the algorithm itself… >>> More