I'm working on a simple resource allocation problem, that I'm solving using backward DP. The full code is at: https://codereview.stackexchange.com/questions/123641/allocating-a-resource
It works fine, but I'm a bit puzzled on how to retrieve the optimal policy once the optimisation is done. I use a memoization decorator but I'm clueless as to how to get the optimal policy out of it at the end. (The policy would be a series of 1,0 as describing when to allocate and when not to.)
def memoize(f):
cache = {}
def memoizedFunction(*args):
if args not in cache:
cache[args] = f(*args)
return cache[args]
memoizedFunction.cache = cache
return memoizedFunction
In an alternative effort I use:
def record(D,key,item):
"""Records the value of item in the dictionary d with k"""
if key in D.keys():
D[key].append(item)
else:
D[key] = [item]
return D
And in the DP:
if v == no_pump:
pos = 0
else:
pos = 1
record(X,i+1,pos)
My hunch is that if I can create a new dictionary (X) for each policy, and record the optimum value up to its execution, it could work. But I don't know how to do that. Also it seams quite inefficient.
To clarify: Forwards methods, or converting it into a graph aren't suitable as the whole problem will be converted into a stochastic control problem later.