1

I am implementing a machine learning algorithm which approximates a matrix as a multiple of two other matrices: V ~= WH. W and H are randomly initialised, and are updated iteratively so that WH more closely resembles V.

In my code, at each iteration, I want to (i) Update W and H, and (ii) Calculate a score based on the new values of W and H.

My problem is this: the function I am using to score should ONLY calculate a score - it should not affect V, W or H - but it appears to do so! I don't know why the function is affecting the global variables - I thought this could only happen if you made a declaration of the form global foo etc. The result is that there are small differences in the calculated W and H depending on whether or not a score is calculated at each iteration - which does not make sense.

Below is some code which I have stripped down as much as possible - it does not implement my algorithm or do anything meaningful, it just reproduces the problem, which is that there are small differences in the calculated W based on whether you comment out the line that calculates the score.

Can anyone see why this changes the result?

import numpy as np

# TRUE, GLOBAL VALUE OF V - should remain the same throughout
V = np.array([[0.0, 4.0, 0.0, 4.0],
              [0.0, 0.0, 1.0, 0.0],
              [4.0, 0.0, 0.0, 3.0]]).astype(float)

# RANDOM INITIALIZATIONS for two matrices, which are then updated by later steps
W = np.array([[ 1.03796229,  1.29098839],
              [ 0.49131664,  0.79759996],
              [ 0.66055735,  0.48055734]]).astype(float)
H = np.array([[ 0.06923306,  0.53105902,  1.1715193,   0.58126684],
              [ 1.71226543,  0.54797385,  0.70978869,  1.58761463]]).astype(float)

# A small number, which is added at some steps to prevent zero division errors/overflows
min_no = np.finfo(np.float32).eps

# A function which calculates SOME SCORE based on V_input - below is the simplest example that reproduces the error
# This function should ONLY calculate and return a score - IT SHOULD NOT UPDATE GLOBAL VARIABLES!
def score(V_input):

    V_input[V_input == 0] = min_no # I believe that THIS LINE may be UPDATING GLOBAL V - but I don't understand why
    scr = np.sum(V_input)

    return scr

# This function UPDATES the W matrix
def W_update(Vw, Ww, Hw):

    WHw = np.matmul(Ww, Hw)
    WHw[WHw == 0] = min_no
    ratio = np.matmul(np.divide(Vw, WHw), np.transpose(Hw))

    return np.multiply(Ww, ratio)

# Repeated update steps
for it in range(10):

    # Update step
    W = W_update(V, W, H)

    # SCORING STEP - A SCORE IS CALCULATED - SHOULD NOT UPDATE GLOBAL VARIABLES
    # HOWEVER, IT APPEARS TO DO SO - SMALL DIFFERENCES BETWEEN FINAL W WHEN COMMENTED OUT/NOT COMMENTED OUT
    score_after_iteration = score(V)

# THE OUTPUT PRINTED HERE IS DIFFERENT DEPENDING ON WHETHER OR NOT THE SCORING STEP IS COMMENTED OUT - WHY?
print(W[:2,:2]) # Just a sample from W after last iteration
2
  • Because you pass a reference of V and hence edit that exact matrix. Commented Jul 10, 2018 at 12:41
  • 2
    V_input is just a local name for the passed-in array, so it's the same array, not a copy. If you need a copy you have to create one. Commented Jul 10, 2018 at 12:42

2 Answers 2

3

In case you pass a variable, you pass a reference to that object. So if you call a function with V you pass a reference to the matrix V, and hence updates to the matrix, are edits to that object. If you for example pass a reference to that list, and then the function edits that list, then you did not edit a copy of that list, but the list itself, and thus these changes can be seen outside the call.

You can however make a copy, like:

for it in range(10):

    # Update step
    W = W_update(V, W, H)

    score_after_iteration = score(V.copy())

The same holds for the W_update by the way, but there it is probably not an issue.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks very much - this has helped me a great deal!
2

Alternatively, change your score function to not update any of its inputs:

def score(V_input):
    return np.sum(np.where(V_input == 0, min_no, V_input))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.