Replace values of a numpy array by values from another numpy array

Question

i have a 1000 * 1000 numpy array with 1 million values which was created as follows :

>>import numpy as np
>>data = np.loadtxt('space_data.txt')
>> print (data)
>>[[ 13.  15.  15. ...,  15.  15.  16.]
   [ 14.  13.  14. ...,  13.  15.  16.]
   [ 16.  13.  13. ...,  13.  15.  17.]
   ..., 
   [ 14.   15.  14. ...,  14.  14.  13.]
   [ 15.   15.  16. ...,  16.  15.  14.]
   [ 14.   13.  16. ...,  16.  16.  16.]]

I have another numpy array which which has 2 columns as follows:

>> print(key)
>>[[ 10.,   S],
   [ 11.,   S],
   [ 12.,   S],
   [ 13.,   M],
   [ 14.,   L],
   [ 15.,   S],
   [ 16.,   S],
   ...,
   [ 92.,   XL],
   [ 93.,   M],
   [ 94.,   XL],
   [ 95.,   S]]

What i would basically want is to replace each element of of the data array with corresponding element in the second column of the key array like this..

>> print(data)
>>[[ M  S  S ...,  S  S  S]
   [ L   M  L ...,  M  S  S]
   [ S   M  M ...,  M  S  XL]
   ..., 
   [ L   S  L ...,  L  L  M]
   [ S   S  S ...,  S  S  L]
   [ L   M  S ...,  S  S  S]]

Please correct your code snippet for data as it's wrong (missing commas). This can confuse other users of the data type. — ha9u63a7
– ha9u63a7, Commented Mar 28, 2015 at 18:32
well, if data is a numpy array of floats, you cannot replace in place its elements by strings, so you need to create another list — Julien Spronck
– Julien Spronck, Commented Mar 28, 2015 at 18:37
@ha9u63ar..i copied this straight from the terminal..i printed the array and there were no commas.. — Amistad
– Amistad, Commented Mar 28, 2015 at 18:37
@Amistad Also post the repr versions of NumPy arrays: print(repr(data)) — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Mar 28, 2015 at 18:39

unutbu · Accepted Answer · 2015-03-28 19:07:09Z

In Python dicts are a natural choice for mapping from keys to values. NumPy has no direct equivalent of a dict. But it does have arrays which can do fast integer indexing. For example,

In [153]: keyarray = np.array(['S','M','L','XL'])

In [158]: data = np.array([[0,2,1], [1,3,2]])

In [159]: keyarray[data]
Out[159]: 
array([['S', 'L', 'M'],
       ['M', 'XL', 'L']], 
      dtype='|S2')

So if we could massage your key array into one that looked like this:

In [161]: keyarray
Out[161]: 
array(['', '', '', '', '', '', '', '', '', '', 'S', 'S', 'S', 'M', 'L',
       'S', 'S', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', 'XL', 'M', 'XL', 'S'], 
      dtype='|S32')

So that 10 maps to 'S' in the sense that keyarray[10] equals S, and so forth:

In [162]: keyarray[10]
Out[162]: 'S'

then we could produce the desired result with keyarray[data].

import numpy as np

data = np.array( [[ 13.,   15.,  15.,  15.,  15.,  16.],
                  [ 14.,   13.,  14.,  13.,  15.,  16.],
                  [ 16.,   13.,  13.,  13.,  15.,  17.],
                  [ 14.,   15.,  14.,  14.,  14.,  13.],
                  [ 15.,   15 ,  16.,  16.,  15.,  14.],
                  [ 14.,   13.,  16.,  16.,  16.,  16.]])

key = np.array([[ 10., 'S'],
                [ 11., 'S'],
                [ 12., 'S'],
                [ 13., 'M'],
                [ 14., 'L'],
                [ 15., 'S'],
                [ 16., 'S'],
                [ 17., 'XL'],
                [ 92., 'XL'],
                [ 93., 'M'],
                [ 94., 'XL'],
                [ 95., 'S']])

idx = np.array(key[:,0], dtype=float).astype(int)
n = idx.max()+1
keyarray = np.empty(n, dtype=key[:,1].dtype)
keyarray[:] = ''
keyarray[idx] = key[:,1]

data = data.astype('int')
print(keyarray[data])

yields

[['M' 'S' 'S' 'S' 'S' 'S']
 ['L' 'M' 'L' 'M' 'S' 'S']
 ['S' 'M' 'M' 'M' 'S' 'XL']
 ['L' 'S' 'L' 'L' 'L' 'M']
 ['S' 'S' 'S' 'S' 'S' 'L']
 ['L' 'M' 'S' 'S' 'S' 'S']]

Note that data = data.astype('int') is assuming that the floats in data can be uniquely mapped to ints. That appears to be the case with your data, but it is not true for arbitrary floats. For example, astype('int') maps both 1.0 and 1.5 map to 1.

In [167]: np.array([1.0, 1.5]).astype('int')
Out[167]: array([1, 1])

Ashwini Chaudhary · Accepted Answer · 2015-03-28 18:40:09Z

3

An un-vectorized linear approach will be to use a dictionary here:

dct = dict(keys)
# new array is required if dtype is different or it it cannot be casted
new_array = np.empty(data.shape, dtype=str)
for index in np.arange(data.size):
    index = np.unravel_index(index, data.shape)
    new_array[index] = dct[data[index]]

answered Mar 28, 2015 at 18:40

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Comments

Julien Spronck · Accepted Answer · 2015-03-28 18:41:00Z

2

import numpy as np

data = np.array([[ 13.,  15.,  15.],
   [ 14.,  13.,  14. ],
   [ 16.,  13.,  13. ]])

key = [[ 10.,   'S'],
   [ 11.,   'S'],
   [ 12.,   'S'],
   [ 13.,   'M'],
   [ 14.,   'L'],
   [ 15.,   'S'],
   [ 16.,   'S']]

data2 = np.zeros(data.shape, dtype=str)

for k in key:
    data2[data == k[0]] = k[1]

answered Mar 28, 2015 at 18:41

Julien Spronck

15.5k5 gold badges50 silver badges57 bronze badges

2 Comments

Ashwini Chaudhary Over a year ago

This should be fast enough if number of items in key array are not huge, but will be quadratic if key's size increases.

Julien Spronck Over a year ago

indeed, looks like there should be 86 entries in key maybe ... otherwise is probably the fastest

Megha Sehgal · Accepted Answer · 2020-07-25 11:00:05Z

0

# Create a dataframe out of your 'data' array and make a dictionary out of your 'key' array. 
import numpy as np
import pandas as pd

data = np.array([[ 13.,  15.,  15.],
               [ 14.,  13.,  14. ],
               [ 16.,  13.,  13. ]])
data_df = pd.DataFrame(data)
key  = dict({10 : 'S',11 : 'S', 12 : 'S', 13 : 'M',14:'L',15:'S',16:'S'})
# Replace the values in newly created dataframe and convert that into array.
data_df.replace(key,inplace = True)

data = np.array(data_df)
print(data)

This will be the output:

[['M' 'S' 'S']
['L' 'M' 'L']
['S' 'M' 'M']]

answered Jul 25, 2020 at 11:00

Megha Sehgal

1

Collectives™ on Stack Overflow

Replace values of a numpy array by values from another numpy array

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related