0

I am working on codifying the policy iteration for a Gridworld task using Python.

My idea was to have two arrays holding the Gridworld, one that holds the results of the previous iteration, and one that holds the results of the current iteration; however, once I wrote the code for it I noticed my values in my results were off because the array that holds the previous iteration was also being modified.

def policyIteration():
    # Init arrays
    arr1 = [[0 for x in range(5)] for y in range(5)]
    arr2 = [[0 for x in range(5)] for y in range(5)]

    # Set entire array to -1
    for idx1, val1 in enumerate(arr1):
        for idx2, val2 in enumerate(val1):
            arr1[idx1][idx2] = -1

    # Set termination states to 0
    arr1[0][0] = 0; arr1[4][4] = 0

    while ( not checkConverge( arr1, arr2 ) ):
        for i in range(5):
            for j in range(5):
                if ( arr2[i][j] != 0 ): # Don't modify the termination states
                    arr1[i][j] = piFunc( arr2, i, j )

Now this function depends on two other sub-functions: piFunc (which calculates the updated value for the cell on the current iteration) and checkConverge (which just returns whether the values are the same).

My piFunc is a horrific mess, but as far as I can tell, it's logically sound.

def piFunc( arr, idx1, idx2 ):
    if ( idx1 == 0 ):
        vUp = -1
    else:
        vUp = arr[idx1-1][idx2]
    
    if ( idx1 == 4 ):
        vDown = -1
    else:
        vDown = arr[idx+1][idx2]

    if ( idx2 == 0 ):
        vLeft = -1
    else:
        vLeft = arr[idx1][idx2-1]
    
    if ( idx2 == 4 ):
        vRight = -1 
    else:
        vRight = arr[idx1][idx2+1]

    val = -1 + ( vUp * 0.25 ) + ( vDown * 0.25 ) + ( vLeft * 0.25 ) + ( vRight * 0.25 )
    return val

In all of these, I never once try and assign anything to arr2 except at the very beginning of the while loop when I go to make the arrays the same. In fact, arr2 only appears in code 4 times! But when I go to check the arrays before and after I end up with something like this:

Before:

arr1:
   0  1  2  3  4
0  0 -1 -1 -1 -1
1 -1 -1 -1 -1 -1
2 -1 -1 -1 -1 -1
3 -1 -1 -1 -1 -1
4 -1 -1 -1 -1  0

arr2:
   0  1  2  3  4
0  0 -1 -1 -1 -1
1 -1 -1 -1 -1 -1
2 -1 -1 -1 -1 -1
3 -1 -1 -1 -1 -1
4 -1 -1 -1 -1  0

After:

arr1:
          0         1         2         3         4
0  0.000000 -1.750000 -2.187500 -2.296875 -2.324219
1 -1.750000 -2.375000 -2.640625 -2.734375 -2.764648
2 -2.187500 -2.640625 -2.820312 -2.888672 -2.913330
3 -2.296875 -2.734375 -2.888672 -2.944336 -2.714417
4 -2.324219 -2.764648 -2.913330 -2.714417  0.000000
arr2:
          0         1         2         3         4
0  0.000000 -1.750000 -2.187500 -2.296875 -2.324219
1 -1.750000 -2.375000 -2.640625 -2.734375 -2.764648
2 -2.187500 -2.640625 -2.820312 -2.888672 -2.913330
3 -2.296875 -2.734375 -2.888672 -2.944336 -2.714417
4 -2.324219 -2.764648 -2.913330 -2.714417  0.000000

Why are the values in arr2 changing at all?

4
  • 2
    If that's really your code, then what you say cannot happen. But if your code actually initializes arr2 using arr2 = arr1, then this is exactly what you would expect. Commented Nov 23, 2021 at 5:38
  • @TimRoberts is that because the values of arr1 and arr2 are now linked together? I'm more used to C languages, and am taking an excursion into the Python world. Commented Nov 23, 2021 at 5:45
  • 1
    It's important to understand the distinction between names and objects. If you do arr2 = arr1, then both names contain a reference to a single list. Changing through either name changes the one list. It is similar to the situation in C if both of these were float *. Commented Nov 23, 2021 at 5:58
  • @TimRoberts Would the method for assigning one array to another, without establishing a pointer-style relationship, be just the iterative method? Commented Nov 23, 2021 at 5:59

1 Answer 1

1

I believe the comment from @TimRoberts above correctly diagnoses the issue.

These arrays currently reference the same object, so any update to one updates the other.

When you initialize arr2 = arr1, it creates a reference to the same object in memory. This makes it such that when one object is updated, so is the values of the other.

To create an array without this pointer style reference you can use:

arr2 = arr1[:]

So:

arr1 = [1, 2 ,3]
arr2 = arr1[:]
arr2.append(4)
print(arr1) # prints [1, 2, 3]
print(arr2) # prints [1, 2, 3, 4]

Example and proof of concept (screenshot of above code): proof of concept

Link to old stack overflow post on this concept: python list by value not by reference

Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly right, thank you for the response.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.