Numpy - function updates global variable when it shouldn't

Question

I am implementing a machine learning algorithm which approximates a matrix as a multiple of two other matrices: V ~= WH. W and H are randomly initialised, and are updated iteratively so that WH more closely resembles V.

In my code, at each iteration, I want to (i) Update W and H, and (ii) Calculate a score based on the new values of W and H.

My problem is this: the function I am using to score should ONLY calculate a score - it should not affect V, W or H - but it appears to do so! I don't know why the function is affecting the global variables - I thought this could only happen if you made a declaration of the form global foo etc. The result is that there are small differences in the calculated W and H depending on whether or not a score is calculated at each iteration - which does not make sense.

Below is some code which I have stripped down as much as possible - it does not implement my algorithm or do anything meaningful, it just reproduces the problem, which is that there are small differences in the calculated W based on whether you comment out the line that calculates the score.

Can anyone see why this changes the result?

import numpy as np

# TRUE, GLOBAL VALUE OF V - should remain the same throughout
V = np.array([[0.0, 4.0, 0.0, 4.0],
              [0.0, 0.0, 1.0, 0.0],
              [4.0, 0.0, 0.0, 3.0]]).astype(float)

# RANDOM INITIALIZATIONS for two matrices, which are then updated by later steps
W = np.array([[ 1.03796229,  1.29098839],
              [ 0.49131664,  0.79759996],
              [ 0.66055735,  0.48055734]]).astype(float)
H = np.array([[ 0.06923306,  0.53105902,  1.1715193,   0.58126684],
              [ 1.71226543,  0.54797385,  0.70978869,  1.58761463]]).astype(float)

# A small number, which is added at some steps to prevent zero division errors/overflows
min_no = np.finfo(np.float32).eps

# A function which calculates SOME SCORE based on V_input - below is the simplest example that reproduces the error
# This function should ONLY calculate and return a score - IT SHOULD NOT UPDATE GLOBAL VARIABLES!
def score(V_input):

    V_input[V_input == 0] = min_no # I believe that THIS LINE may be UPDATING GLOBAL V - but I don't understand why
    scr = np.sum(V_input)

    return scr

# This function UPDATES the W matrix
def W_update(Vw, Ww, Hw):

    WHw = np.matmul(Ww, Hw)
    WHw[WHw == 0] = min_no
    ratio = np.matmul(np.divide(Vw, WHw), np.transpose(Hw))

    return np.multiply(Ww, ratio)

# Repeated update steps
for it in range(10):

    # Update step
    W = W_update(V, W, H)

    # SCORING STEP - A SCORE IS CALCULATED - SHOULD NOT UPDATE GLOBAL VARIABLES
    # HOWEVER, IT APPEARS TO DO SO - SMALL DIFFERENCES BETWEEN FINAL W WHEN COMMENTED OUT/NOT COMMENTED OUT
    score_after_iteration = score(V)

# THE OUTPUT PRINTED HERE IS DIFFERENT DEPENDING ON WHETHER OR NOT THE SCORING STEP IS COMMENTED OUT - WHY?
print(W[:2,:2]) # Just a sample from W after last iteration

Because you pass a reference of V and hence edit that exact matrix. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Jul 10, 2018 at 12:41
V_input is just a local name for the passed-in array, so it's the same array, not a copy. If you need a copy you have to create one. — PM 2Ring
– PM 2Ring, Commented Jul 10, 2018 at 12:42

willeM_ Van Onsem · Accepted Answer · 2018-07-10 12:44:09Z

3

In case you pass a variable, you pass a reference to that object. So if you call a function with V you pass a reference to the matrix V, and hence updates to the matrix, are edits to that object. If you for example pass a reference to that list, and then the function edits that list, then you did not edit a copy of that list, but the list itself, and thus these changes can be seen outside the call.

You can however make a copy, like:

for it in range(10):

    # Update step
    W = W_update(V, W, H)

    score_after_iteration = score(V.copy())

The same holds for the W_update by the way, but there it is probably not an issue.

answered Jul 10, 2018 at 12:44

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hannes whittingham Over a year ago

Thanks very much - this has helped me a great deal!

Eric · Accepted Answer · 2018-07-10 13:29:03Z

2

Alternatively, change your score function to not update any of its inputs:

def score(V_input):
    return np.sum(np.where(V_input == 0, min_no, V_input))

answered Jul 10, 2018 at 13:29

Eric

98.1k54 gold badges257 silver badges389 bronze badges

Collectives™ on Stack Overflow

Numpy - function updates global variable when it shouldn't

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related