Easy way to convert list of string to numpy array

Question

I am working with data from World Ocean Database (WOD), and somehow I ended up with a list that looks like this one:

     idata = 
     ['         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
      '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
      '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
      '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
      '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

Is there any easy way to convert this structure into a numpy array or in a friendlier format? I just want to add the information of the columns in a pandas DataFrame.

norok2 · Accepted Answer · 2019-12-03 21:54:09Z

You could use a combination of string manipulation (i.e. strip() and split()) and list comprehensions:

import numpy as np


idata = [
    '         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
    '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
    '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
    '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
    '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

ll = [[float(x.strip()) for x in s.split(',') if x.strip()] for s in idata]
print(np.array(ll))
# [[ 1.      0.      0.      6.2386  0.     33.2166  0.    ]
#  [ 2.      5.      0.      6.2385  0.     33.2166  0.    ]
#  [ 3.     10.      0.      6.2306  0.     33.2175  0.    ]
#  [ 4.     15.      0.      6.2359  0.     33.2176  0.    ]
#  [ 5.     20.      0.      6.2387  0.     33.2175  0.    ]]

which can also be fed to a Pandas dataframe constructor:

import pandas as pd


df = pd.DataFrame(ll)
print(df)
#      0     1    2       3    4        5    6
# 0  1.0   0.0  0.0  6.2386  0.0  33.2166  0.0
# 1  2.0   5.0  0.0  6.2385  0.0  33.2166  0.0
# 2  3.0  10.0  0.0  6.2306  0.0  33.2175  0.0
# 3  4.0  15.0  0.0  6.2359  0.0  33.2176  0.0
# 4  5.0  20.0  0.0  6.2387  0.0  33.2175  0.0

ldz · Accepted Answer · 2019-12-04 13:12:33Z

1

You might split the values by comma, strip the parts and add the resulting array to a DataFrame like follows:

import pandas as pd

data = [[item.strip() for item in line.split(',')] for line in idata]
df = pd.DataFrame(data)

In order to safely convert the DataFrame to numeric values pd.to_numeric could be used:

df = df.apply(pd.to_numeric)

edited Dec 4, 2019 at 13:12

answered Dec 3, 2019 at 21:59

ldz

2,22519 silver badges21 bronze badges

1 Comment

norok2 Over a year ago

Note that the dtype of the columns would be object. Based on the content of the input, I am not sure that this is a desirable behavior in this case.

jez · Accepted Answer · 2019-12-03 22:00:34Z

try: from io import StringIO  # Python 3
except: from StringIO import StringIO  # Python 2

import pandas as pd

df = pd.read_csv(StringIO(''.join(idata)), index_col=0, header=None, sep=r',\s*', engine='python')

print(df)

# prints:
#       1   2   3       4   5   6        7   8   9  10
# 0                                                   
# 1   0.0   0 NaN  6.2386   0 NaN  33.2166   0 NaN NaN
# 2   5.0   0 NaN  6.2385   0 NaN  33.2166   0 NaN NaN
# 3  10.0   0 NaN  6.2306   0 NaN  33.2175   0 NaN NaN
# 4  15.0   0 NaN  6.2359   0 NaN  33.2176   0 NaN NaN
# 5  20.0   0 NaN  6.2387   0 NaN  33.2175   0 NaN NaN

Remove the header=None if you can include an initial row of idata that actually specifies helpful column labels. Remove sep=r',\s*', engine='python' if you're happy for the blank columns to contain blank string objects instead of NaN.

Collectives™ on Stack Overflow

Easy way to convert list of string to numpy array

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related