I am implementing a machine learning algorithm which approximates a matrix as a multiple of two other matrices: V ~= WH. W and H are randomly initialised, and are updated iteratively so that WH more closely resembles V.
In my code, at each iteration, I want to (i) Update W and H, and (ii) Calculate a score based on the new values of W and H.
My problem is this: the function I am using to score should ONLY calculate a score - it should not affect V, W or H - but it appears to do so! I don't know why the function is affecting the global variables - I thought this could only happen if you made a declaration of the form global foo etc. The result is that there are small differences in the calculated W and H depending on whether or not a score is calculated at each iteration - which does not make sense.
Below is some code which I have stripped down as much as possible - it does not implement my algorithm or do anything meaningful, it just reproduces the problem, which is that there are small differences in the calculated W based on whether you comment out the line that calculates the score.
Can anyone see why this changes the result?
import numpy as np
# TRUE, GLOBAL VALUE OF V - should remain the same throughout
V = np.array([[0.0, 4.0, 0.0, 4.0],
[0.0, 0.0, 1.0, 0.0],
[4.0, 0.0, 0.0, 3.0]]).astype(float)
# RANDOM INITIALIZATIONS for two matrices, which are then updated by later steps
W = np.array([[ 1.03796229, 1.29098839],
[ 0.49131664, 0.79759996],
[ 0.66055735, 0.48055734]]).astype(float)
H = np.array([[ 0.06923306, 0.53105902, 1.1715193, 0.58126684],
[ 1.71226543, 0.54797385, 0.70978869, 1.58761463]]).astype(float)
# A small number, which is added at some steps to prevent zero division errors/overflows
min_no = np.finfo(np.float32).eps
# A function which calculates SOME SCORE based on V_input - below is the simplest example that reproduces the error
# This function should ONLY calculate and return a score - IT SHOULD NOT UPDATE GLOBAL VARIABLES!
def score(V_input):
V_input[V_input == 0] = min_no # I believe that THIS LINE may be UPDATING GLOBAL V - but I don't understand why
scr = np.sum(V_input)
return scr
# This function UPDATES the W matrix
def W_update(Vw, Ww, Hw):
WHw = np.matmul(Ww, Hw)
WHw[WHw == 0] = min_no
ratio = np.matmul(np.divide(Vw, WHw), np.transpose(Hw))
return np.multiply(Ww, ratio)
# Repeated update steps
for it in range(10):
# Update step
W = W_update(V, W, H)
# SCORING STEP - A SCORE IS CALCULATED - SHOULD NOT UPDATE GLOBAL VARIABLES
# HOWEVER, IT APPEARS TO DO SO - SMALL DIFFERENCES BETWEEN FINAL W WHEN COMMENTED OUT/NOT COMMENTED OUT
score_after_iteration = score(V)
# THE OUTPUT PRINTED HERE IS DIFFERENT DEPENDING ON WHETHER OR NOT THE SCORING STEP IS COMMENTED OUT - WHY?
print(W[:2,:2]) # Just a sample from W after last iteration
Vand hence edit that exact matrix.V_inputis just a local name for the passed-in array, so it's the same array, not a copy. If you need a copy you have to create one.