3

How do I keep the original data type when convert list of list into numpy array?

I used np.array, np.matrix to convert list of list into numpy array. But it turns out that all of int become string. Python version is 3.7.x.

X = [[3, 'aa', 10],                 
     [1, 'bb', 22],                      
     [2, 'cc', 28],                      
     [5, 'bb', 32],                      
     [4, 'cc', 32]]
# X is a list of list
X = np.array(X)
return X

# X becomes
[['3' 'aa' '10']
 ['1' 'bb' '22']
 ['2' 'cc' '28']
 ['5' 'bb' '32']
 ['4' 'cc' '32']]
1
  • What are you going to do with this array? Commented Mar 25, 2019 at 3:52

3 Answers 3

4

Use X = np.array(X, dtype="O") instead. Every item is stored as Python object then.

Sign up to request clarification or add additional context in comments.

Comments

3

You can use any of these:

  • X = np.array(X,dtype='object')

  • X = np.array(X,dtype=object)

  • X = np.array(X, dtype='O')

They all work, so whole code:

X = [[3, 'aa', 10],                 
     [1, 'bb', 22],                      
     [2, 'cc', 28],                      
     [5, 'bb', 32],                      
     [4, 'cc', 32]]
# X is a list of list
One you picked
return X 

P.S. return only works in a function, outside a function, use print

Comments

3

Another option is to make a structured array, with a mix of integer and string fields.

In [252]: import numpy.lib.recfunctions as rf 

In [258]: X = [[3, 'aa', 10],                  
     ...:      [1, 'bb', 22],                       
     ...:      [2, 'cc', 28],                       
     ...:      [5, 'bb', 32],                       
     ...:      [4, 'cc', 32]]                                                   
In [259]: dt = np.dtype('i,U10,i')                                              
In [260]: dt                                                                    
Out[260]: dtype([('f0', '<i4'), ('f1', '<U10'), ('f2', '<i4')])

Recent (1.16) numpy has a function that converts unstructured arrays (e.g. the string dtype) to structured:

In [261]: Y = rf.unstructured_to_structured(np.array(X), dt)                    
In [262]: Y                                                                     
Out[262]: 
array([(3, 'aa', 10), (1, 'bb', 22), (2, 'cc', 28), (5, 'bb', 32),
       (4, 'cc', 32)],
      dtype=[('f0', '<i4'), ('f1', '<U10'), ('f2', '<i4')])

Fields are accessed by name:

In [264]: Y['f0']                                                               
Out[264]: array([3, 1, 2, 5, 4], dtype=int32)
In [265]: Y['f1']                                                               
Out[265]: array(['aa', 'bb', 'cc', 'bb', 'cc'], dtype='<U10')

Converting X to a list of tuples will work just as well

In [266]: np.array([tuple(row) for row in X], dtype=dt)                         
Out[266]: 
array([(3, 'aa', 10), (1, 'bb', 22), (2, 'cc', 28), (5, 'bb', 32),
       (4, 'cc', 32)],
      dtype=[('f0', '<i4'), ('f1', '<U10'), ('f2', '<i4')])

The object array and structured array each have their advantages and disadvantages. So which is better will depend on what you intend to do with array. For that matter, the original list may, for many purposes, be just as good. None has the same processing speed (for math operations) as a 2d numeric array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.