0

I’m trying to normalize some values in a numpy array with the following shape:

import numpy as np
X = np.random.rand(100, 20, 3)

This data says there are 100 time stamps for each of 20 observations, where each observation has 3 dimensional attributes (x, y, z). I want to normalize the x, y, z dimensional attributes in the following way. For each dimension, I want to subtract the min then divide by the resulting max (to “center” the dimension’s values).

I attempted to do this with the following:

# center all features
for i in range(3):
  X[:][:][i] -= np.min(X[:][:][i])
  X[:][:][i] /= np.max(X[:][:][i])

This does not mutate all values of the ith dimension, however.

How can I center my features in this way? Any help others can offer would be greatly appreciated!

3
  • I'm not sure X[:][:][i] is multi-dimensional indexing in numpy. Shouldn't it be X[:, :, i]? I can't test atm. Commented Oct 16, 2018 at 13:39
  • "This does not mutate all values of the ith dimension" What do you mean by that? Could you please clarify? Commented Oct 16, 2018 at 13:43
  • @roganjosh you nailed it. If you make your comment an answer I'll accept it. @AGNGazer I mean if you run the code above, not all values of the ith index in the last dimension will be mutated. Here i is an index into the 3 dimensional space (the third of the values provided by X.shape). Sorry my language here is kind of clunky. Commented Oct 16, 2018 at 13:48

2 Answers 2

2

X[:] is python syntax to basically shallow copy every element in a list. So, you're copying the matrix twice and then trying to index by i. You need X[:, :, i]. See numpy indexing for more about multidimensional indexing of arrays.

Sign up to request clarification or add additional context in comments.

Comments

1
X -= np.amin(X, axis=(0, 1))
X /= np.amax(X, axis=(0, 1))

NOTE: According to numpy.amin() documentation (similar for amax()):

Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.

By specifying axis=(0, 1), I ask numpy.amin() to find minimum by looking at all rows and columns for each "depth" (3rd axis) element.


Step-by-step illustration:

In [1]: import numpy as np
   ...: np.random.seed(0)
   ...: X = np.random.rand(2, 4, 3)
   ...: print("\nOriginal X:\n%s" % X)
   ...: xmin = np.amin(X, axis=(0, 1))
   ...: print("\nxmin = %s" % xmin)
   ...: X -= xmin
   ...: print("\nSubtracted X:\n%s" % X)
   ...: xmax = np.amax(X, axis=(0, 1))
   ...: X /= xmax
   ...: print("\nDivided X:\n%s" % X)
   ...: 
   ...: 

Original X:
[[[0.5488135  0.71518937 0.60276338]
  [0.54488318 0.4236548  0.64589411]
  [0.43758721 0.891773   0.96366276]
  [0.38344152 0.79172504 0.52889492]]

 [[0.56804456 0.92559664 0.07103606]
  [0.0871293  0.0202184  0.83261985]
  [0.77815675 0.87001215 0.97861834]
  [0.79915856 0.46147936 0.78052918]]]

xmin = [0.0871293  0.0202184  0.07103606]

Subtracted X:
[[[0.4616842  0.69497097 0.53172732]
  [0.45775388 0.4034364  0.57485805]
  [0.35045791 0.8715546  0.8926267 ]
  [0.29631222 0.77150664 0.45785886]]

 [[0.48091526 0.90537824 0.        ]
  [0.         0.         0.76158379]
  [0.69102745 0.84979375 0.90758228]
  [0.71202926 0.44126096 0.70949312]]]

xmax = [0.71202926 0.90537824 0.90758228]

Divided X:
[[[0.64840622 0.76760291 0.5858723 ]
  [0.64288633 0.44559984 0.63339497]
  [0.49219594 0.96264143 0.98352151]
  [0.41615174 0.85213738 0.50448193]]

 [[0.67541502 1.         0.        ]
  [0.         0.         0.8391347 ]
  [0.97050428 0.93860633 1.        ]
  [1.         0.48737748 0.78173972]]]

6 Comments

thanks for your note. Can you please give a quick gloss on what the axis args are doing here? I'd be grateful if you could!
I accepted roganjosh's answer only because he got me going first and his answer is more legible to me, but this may be a better answer performance-wise
@duhaime No problem at all. Good luck
@duhaime I suspect it probably is but I can't check to confirm the dimensions it works on are the same
@duhaime You'll get best results if you try to work at entire arrays instead of looping over indices.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.