normalization of values in python np array gone wrong?

Question

I have a matrix of floats shaped (3000, 9). Across 1 line, there is 1 ''simulation''. Across columns, for a fixed line, there's the contents of the ''simulation''.

I want that for each simulation, the first 8 columns to be normalized to the sum of the 8 first columns. That is, the first column's entry (for one fixed line) to become what was before, over the sum of the first 8 columns (for that same fixed line).

A trivial task, but I get from a nice, correct, graph (non-normalized), something totally unphysical when plotting with plt.scatter.

The last column of each line is what we are going to use for the x-axis to plot the first 8 columns (the y values). So one line will represent 8 datapoints for 1 fixed value of x.

The non-normalized graph: https://ibb.co/Msr8RVB

The normalized graph: https://ibb.co/tJp7bZn

The datasets: non-normalized: https://easyupload.io/oat9kq

My code:

import numpy as np
from matplotlib import pyplot as plt


non_norm = np.loadtxt("integration_results_3000samples_10_20_10_25_Wcm2_BenSimulationFromSlack.txt")

plt.figure()
for i in range(non_norm.shape[1]-1):
    plt.scatter(non_norm[:, -1], non_norm[:, i], label="c_{}".format(i+47))
plt.xscale("log")
plt.savefig("non-norm_Ben3000samples.pdf", bbox_inches='tight')

norm = np.empty( (non_norm.shape[0], non_norm.shape[1]) )
norm[:, -1] = non_norm[:, -1]

for i in range(norm.shape[1]-1):
    for j in range(norm.shape[0]):
        norm[j, i] = np.true_divide(non_norm[j, i] , np.sum(non_norm[j, :-1]))

plt.figure()
for i in range(norm.shape[1]-1):
    plt.scatter(norm[:, -1], norm[:, i], label="c_{}".format(i+47))
plt.xscale("log")
plt.savefig("norm_Ben3000samples.pdf", bbox_inches='tight')

Do you see what went wrong? Thank you

@not_speshal, what do you mean by a sample of non_norm? Thanks. non_norm is extracted from the .txt I uploaded at: file.io/deleted. Edit: the file has been deleted by unknown reasons. — velenos14
– velenos14, Commented Jul 7, 2021 at 15:51
can you check the output of print(non_norm[:10]) before plotting? When I run your code, I get a whole lot of np.nan values. — not_speshal
– not_speshal, Commented Jul 7, 2021 at 16:12
You realise when you're normalising a row that has just one value and 7 zeroes, the value becomes 1 and the rest of the row is 0? This is likely why your plot is messing up. Plot each column one by one (normalized and non-normalized) and you'll see what I mean. — not_speshal
– not_speshal, Commented Jul 7, 2021 at 16:38

not_speshal · Accepted Answer · 2021-07-07 17:06:07Z

1

When you're normalising a row that has just one value and 7 zeroes, the value becomes 1 and the rest of the row is 0? This is likely why your plot is messing up.

For example, the plot for the first column looks like this before and after normalization:

edited Jul 7, 2021 at 17:06

answered Jul 7, 2021 at 15:53

not_speshal

23.2k2 gold badges18 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

normalization of values in python np array gone wrong?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related