4

i have a 1000 * 1000 numpy array with 1 million values which was created as follows :

>>import numpy as np
>>data = np.loadtxt('space_data.txt')
>> print (data)
>>[[ 13.  15.  15. ...,  15.  15.  16.]
   [ 14.  13.  14. ...,  13.  15.  16.]
   [ 16.  13.  13. ...,  13.  15.  17.]
   ..., 
   [ 14.   15.  14. ...,  14.  14.  13.]
   [ 15.   15.  16. ...,  16.  15.  14.]
   [ 14.   13.  16. ...,  16.  16.  16.]]

I have another numpy array which which has 2 columns as follows:

>> print(key)
>>[[ 10.,   S],
   [ 11.,   S],
   [ 12.,   S],
   [ 13.,   M],
   [ 14.,   L],
   [ 15.,   S],
   [ 16.,   S],
   ...,
   [ 92.,   XL],
   [ 93.,   M],
   [ 94.,   XL],
   [ 95.,   S]]

What i would basically want is to replace each element of of the data array with corresponding element in the second column of the key array like this..

>> print(data)
>>[[ M  S  S ...,  S  S  S]
   [ L   M  L ...,  M  S  S]
   [ S   M  M ...,  M  S  XL]
   ..., 
   [ L   S  L ...,  L  L  M]
   [ S   S  S ...,  S  S  L]
   [ L   M  S ...,  S  S  S]]
6
  • Please correct your code snippet for data as it's wrong (missing commas). This can confuse other users of the data type. Commented Mar 28, 2015 at 18:32
  • 1
    are S, M, L ... variable names or strings? Commented Mar 28, 2015 at 18:34
  • well, if data is a numpy array of floats, you cannot replace in place its elements by strings, so you need to create another list Commented Mar 28, 2015 at 18:37
  • @ha9u63ar..i copied this straight from the terminal..i printed the array and there were no commas.. Commented Mar 28, 2015 at 18:37
  • @Amistad Also post the repr versions of NumPy arrays: print(repr(data)) Commented Mar 28, 2015 at 18:39

4 Answers 4

9

In Python dicts are a natural choice for mapping from keys to values. NumPy has no direct equivalent of a dict. But it does have arrays which can do fast integer indexing. For example,

In [153]: keyarray = np.array(['S','M','L','XL'])

In [158]: data = np.array([[0,2,1], [1,3,2]])

In [159]: keyarray[data]
Out[159]: 
array([['S', 'L', 'M'],
       ['M', 'XL', 'L']], 
      dtype='|S2')

So if we could massage your key array into one that looked like this:

In [161]: keyarray
Out[161]: 
array(['', '', '', '', '', '', '', '', '', '', 'S', 'S', 'S', 'M', 'L',
       'S', 'S', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', 'XL', 'M', 'XL', 'S'], 
      dtype='|S32')

So that 10 maps to 'S' in the sense that keyarray[10] equals S, and so forth:

In [162]: keyarray[10]
Out[162]: 'S'

then we could produce the desired result with keyarray[data].


import numpy as np

data = np.array( [[ 13.,   15.,  15.,  15.,  15.,  16.],
                  [ 14.,   13.,  14.,  13.,  15.,  16.],
                  [ 16.,   13.,  13.,  13.,  15.,  17.],
                  [ 14.,   15.,  14.,  14.,  14.,  13.],
                  [ 15.,   15 ,  16.,  16.,  15.,  14.],
                  [ 14.,   13.,  16.,  16.,  16.,  16.]])

key = np.array([[ 10., 'S'],
                [ 11., 'S'],
                [ 12., 'S'],
                [ 13., 'M'],
                [ 14., 'L'],
                [ 15., 'S'],
                [ 16., 'S'],
                [ 17., 'XL'],
                [ 92., 'XL'],
                [ 93., 'M'],
                [ 94., 'XL'],
                [ 95., 'S']])

idx = np.array(key[:,0], dtype=float).astype(int)
n = idx.max()+1
keyarray = np.empty(n, dtype=key[:,1].dtype)
keyarray[:] = ''
keyarray[idx] = key[:,1]

data = data.astype('int')
print(keyarray[data])

yields

[['M' 'S' 'S' 'S' 'S' 'S']
 ['L' 'M' 'L' 'M' 'S' 'S']
 ['S' 'M' 'M' 'M' 'S' 'XL']
 ['L' 'S' 'L' 'L' 'L' 'M']
 ['S' 'S' 'S' 'S' 'S' 'L']
 ['L' 'M' 'S' 'S' 'S' 'S']]

Note that data = data.astype('int') is assuming that the floats in data can be uniquely mapped to ints. That appears to be the case with your data, but it is not true for arbitrary floats. For example, astype('int') maps both 1.0 and 1.5 map to 1.

In [167]: np.array([1.0, 1.5]).astype('int')
Out[167]: array([1, 1])
Sign up to request clarification or add additional context in comments.

Comments

3

An un-vectorized linear approach will be to use a dictionary here:

dct = dict(keys)
# new array is required if dtype is different or it it cannot be casted
new_array = np.empty(data.shape, dtype=str)
for index in np.arange(data.size):
    index = np.unravel_index(index, data.shape)
    new_array[index] = dct[data[index]] 

Comments

2
import numpy as np

data = np.array([[ 13.,  15.,  15.],
   [ 14.,  13.,  14. ],
   [ 16.,  13.,  13. ]])

key = [[ 10.,   'S'],
   [ 11.,   'S'],
   [ 12.,   'S'],
   [ 13.,   'M'],
   [ 14.,   'L'],
   [ 15.,   'S'],
   [ 16.,   'S']]

data2 = np.zeros(data.shape, dtype=str)

for k in key:
    data2[data == k[0]] = k[1]

2 Comments

This should be fast enough if number of items in key array are not huge, but will be quadratic if key's size increases.
indeed, looks like there should be 86 entries in key maybe ... otherwise is probably the fastest
0
# Create a dataframe out of your 'data' array and make a dictionary out of your 'key' array. 
import numpy as np
import pandas as pd

data = np.array([[ 13.,  15.,  15.],
               [ 14.,  13.,  14. ],
               [ 16.,  13.,  13. ]])
data_df = pd.DataFrame(data)
key  = dict({10 : 'S',11 : 'S', 12 : 'S', 13 : 'M',14:'L',15:'S',16:'S'})
# Replace the values in newly created dataframe and convert that into array.
data_df.replace(key,inplace = True)

data = np.array(data_df)
print(data)

This will be the output:

[['M' 'S' 'S']
['L' 'M' 'L']
['S' 'M' 'M']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.