How to retrieve policy after Dynamic Programming?

Question

I'm working on a simple resource allocation problem, that I'm solving using backward DP. The full code is at: https://codereview.stackexchange.com/questions/123641/allocating-a-resource

It works fine, but I'm a bit puzzled on how to retrieve the optimal policy once the optimisation is done. I use a memoization decorator but I'm clueless as to how to get the optimal policy out of it at the end. (The policy would be a series of 1,0 as describing when to allocate and when not to.)

def memoize(f):
    cache = {}

    def memoizedFunction(*args):
        if args not in cache:
            cache[args] = f(*args)
        return cache[args]

    memoizedFunction.cache = cache
    return memoizedFunction

In an alternative effort I use:

def record(D,key,item):
    """Records the value of item in the dictionary d with k"""
    if key in D.keys():
        D[key].append(item)
    else:
        D[key] = [item]

    return D

And in the DP:

   if v == no_pump:
        pos = 0
    else:
        pos = 1

    record(X,i+1,pos)

My hunch is that if I can create a new dictionary (X) for each policy, and record the optimum value up to its execution, it could work. But I don't know how to do that. Also it seams quite inefficient.

To clarify: Forwards methods, or converting it into a graph aren't suitable as the whole problem will be converted into a stochastic control problem later.

Winston Ewert · Accepted Answer · 2016-04-25 05:25:17Z

1

Right now the function you memoize simply returns the score the best solution. Change is so that it returns a tuple of the first step of the best solution and the score. Something like:

@memoize
def score(...)
    _, zero_score = score(...)
    _, one_score = score(...)
    if zero_score < one_score:
       return 0, zero_score
    else:
       return 1, one_score

Now, to find out the next step, you simply have to call score and it will tell you the next step and the score.

edited Apr 25, 2016 at 5:25

answered Apr 25, 2016 at 5:19

Winston Ewert

25.1k12 gold badges76 silver badges104 bronze badges

Add a comment |

verisimilidude · Accepted Answer · 2016-03-25 19:14:38Z

-2

Move your memoizedFunction.cache dictionary out of the function memoize function so it is at a global level. Then you will be able to query it after the optimization has run.

answered Mar 25, 2016 at 19:14

verisimilidude

1

Add a comment |

Stack Exchange Network

How to retrieve policy after Dynamic Programming?

2 Answers 2

Your Answer

Hot Network Questions

How to retrieve policy after Dynamic Programming?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions