Python: Centering Features in Numpy Array

Question

I’m trying to normalize some values in a numpy array with the following shape:

import numpy as np
X = np.random.rand(100, 20, 3)

This data says there are 100 time stamps for each of 20 observations, where each observation has 3 dimensional attributes (x, y, z). I want to normalize the x, y, z dimensional attributes in the following way. For each dimension, I want to subtract the min then divide by the resulting max (to “center” the dimension’s values).

I attempted to do this with the following:

# center all features
for i in range(3):
  X[:][:][i] -= np.min(X[:][:][i])
  X[:][:][i] /= np.max(X[:][:][i])

This does not mutate all values of the ith dimension, however.

How can I center my features in this way? Any help others can offer would be greatly appreciated!

I'm not sure X[:][:][i] is multi-dimensional indexing in numpy. Shouldn't it be X[:, :, i]? I can't test atm. — roganjosh
– roganjosh, Commented Oct 16, 2018 at 13:39
"This does not mutate all values of the ith dimension" What do you mean by that? Could you please clarify? — AGN Gazer
– AGN Gazer, Commented Oct 16, 2018 at 13:43
@roganjosh you nailed it. If you make your comment an answer I'll accept it. @AGNGazer I mean if you run the code above, not all values of the ith index in the last dimension will be mutated. Here i is an index into the 3 dimensional space (the third of the values provided by X.shape). Sorry my language here is kind of clunky. — duhaime
– duhaime, Commented Oct 16, 2018 at 13:48

roganjosh · Accepted Answer · 2018-10-16 13:53:22Z

2

X[:] is python syntax to basically shallow copy every element in a list. So, you're copying the matrix twice and then trying to index by i. You need X[:, :, i]. See numpy indexing for more about multidimensional indexing of arrays.

answered Oct 16, 2018 at 13:53

roganjosh

13.3k4 gold badges33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AGN Gazer · Accepted Answer · 2018-10-16 14:08:14Z

1

X -= np.amin(X, axis=(0, 1))
X /= np.amax(X, axis=(0, 1))

NOTE: According to numpy.amin() documentation (similar for amax()):

Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.

By specifying axis=(0, 1), I ask numpy.amin() to find minimum by looking at all rows and columns for each "depth" (3rd axis) element.

Step-by-step illustration:

In [1]: import numpy as np
   ...: np.random.seed(0)
   ...: X = np.random.rand(2, 4, 3)
   ...: print("\nOriginal X:\n%s" % X)
   ...: xmin = np.amin(X, axis=(0, 1))
   ...: print("\nxmin = %s" % xmin)
   ...: X -= xmin
   ...: print("\nSubtracted X:\n%s" % X)
   ...: xmax = np.amax(X, axis=(0, 1))
   ...: X /= xmax
   ...: print("\nDivided X:\n%s" % X)
   ...: 
   ...: 

Original X:
[[[0.5488135  0.71518937 0.60276338]
  [0.54488318 0.4236548  0.64589411]
  [0.43758721 0.891773   0.96366276]
  [0.38344152 0.79172504 0.52889492]]

 [[0.56804456 0.92559664 0.07103606]
  [0.0871293  0.0202184  0.83261985]
  [0.77815675 0.87001215 0.97861834]
  [0.79915856 0.46147936 0.78052918]]]

xmin = [0.0871293  0.0202184  0.07103606]

Subtracted X:
[[[0.4616842  0.69497097 0.53172732]
  [0.45775388 0.4034364  0.57485805]
  [0.35045791 0.8715546  0.8926267 ]
  [0.29631222 0.77150664 0.45785886]]

 [[0.48091526 0.90537824 0.        ]
  [0.         0.         0.76158379]
  [0.69102745 0.84979375 0.90758228]
  [0.71202926 0.44126096 0.70949312]]]

xmax = [0.71202926 0.90537824 0.90758228]

Divided X:
[[[0.64840622 0.76760291 0.5858723 ]
  [0.64288633 0.44559984 0.63339497]
  [0.49219594 0.96264143 0.98352151]
  [0.41615174 0.85213738 0.50448193]]

 [[0.67541502 1.         0.        ]
  [0.         0.         0.8391347 ]
  [0.97050428 0.93860633 1.        ]
  [1.         0.48737748 0.78173972]]]

edited Oct 16, 2018 at 14:08

answered Oct 16, 2018 at 13:49

AGN Gazer

8,4272 gold badges31 silver badges49 bronze badges

6 Comments

duhaime Over a year ago

thanks for your note. Can you please give a quick gloss on what the axis args are doing here? I'd be grateful if you could!

duhaime Over a year ago

I accepted roganjosh's answer only because he got me going first and his answer is more legible to me, but this may be a better answer performance-wise

AGN Gazer Over a year ago

@duhaime No problem at all. Good luck

roganjosh Over a year ago

@duhaime I suspect it probably is but I can't check to confirm the dimensions it works on are the same

AGN Gazer Over a year ago

@duhaime You'll get best results if you try to work at entire arrays instead of looping over indices.

|

Collectives™ on Stack Overflow

Python: Centering Features in Numpy Array

2 Answers 2

Comments

Step-by-step illustration:

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Step-by-step illustration:

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related