0

In the following code, I am attempting to calculate both the frequency and sum of a set of vectors (numpy vectors)

def calculate_means_on(the_labels, the_data):
    freq = dict();
    sums = dict();
    means = dict();
    total = 0;
    for index, a_label in enumerate(the_labels):
        this_data = the_data[index];
        if a_label not in freq:
            freq[a_label] = 1;
            sums[a_label] = this_data;
        else:
            freq[a_label] += 1;
            sums[a_label] += this_data;

Suppose the_data (a numpy 'matrix') is originally :

[[ 1.  2.  4.]
 [ 1.  2.  4.]
 [ 2.  1.  1.]
 [ 2.  1.  1.]
 [ 1.  1.  1.]]

After running the above code, the_data becomes:

[[  3.   6.  12.]
 [  1.   2.   4.]
 [  7.   4.   4.]
 [  2.   1.   1.]
 [  1.   1.   1.]]

Why is this? I've deduced it down to the line sums[a_label] += this_data; as when i change it to sums[a_label] = sums[a_label] + this_data; it behaves as expected; i.e., the_data is not modified.

1
  • 1
    See here Commented Apr 14, 2017 at 23:28

1 Answer 1

4

This line:

this_data = the_data[index]

takes a view, not a copy, of a row of the_data. The view is backed by the original array, and mutating the view will write through to the original array.

This line:

sums[a_label] = this_data

inserts that view into the sums dict, and this line:

sums[a_label] += this_data

mutates the original array through the view, since += requests that the operation be performed by mutation instead of by creating a new object, when the object is mutable.

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome. sums[a_label] = np.copy(this_data) it is. Will accept as soon as it lets me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.