1

I am working with data from World Ocean Database (WOD), and somehow I ended up with a list that looks like this one:

     idata = 
     ['         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
      '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
      '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
      '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
      '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

Is there any easy way to convert this structure into a numpy array or in a friendlier format? I just want to add the information of the columns in a pandas DataFrame.

3 Answers 3

1

You could use a combination of string manipulation (i.e. strip() and split()) and list comprehensions:

import numpy as np


idata = [
    '         1,       0.0,0, ,    6.2386,0, ,   33.2166,0, ,\n',
    '         2,       5.0,0, ,    6.2385,0, ,   33.2166,0, ,\n',
    '         3,      10.0,0, ,    6.2306,0, ,   33.2175,0, ,\n',
    '         4,      15.0,0, ,    6.2359,0, ,   33.2176,0, ,\n',
    '         5,      20.0,0, ,    6.2387,0, ,   33.2175,0, ,\n']

ll = [[float(x.strip()) for x in s.split(',') if x.strip()] for s in idata]
print(np.array(ll))
# [[ 1.      0.      0.      6.2386  0.     33.2166  0.    ]
#  [ 2.      5.      0.      6.2385  0.     33.2166  0.    ]
#  [ 3.     10.      0.      6.2306  0.     33.2175  0.    ]
#  [ 4.     15.      0.      6.2359  0.     33.2176  0.    ]
#  [ 5.     20.      0.      6.2387  0.     33.2175  0.    ]]

which can also be fed to a Pandas dataframe constructor:

import pandas as pd


df = pd.DataFrame(ll)
print(df)
#      0     1    2       3    4        5    6
# 0  1.0   0.0  0.0  6.2386  0.0  33.2166  0.0
# 1  2.0   5.0  0.0  6.2385  0.0  33.2166  0.0
# 2  3.0  10.0  0.0  6.2306  0.0  33.2175  0.0
# 3  4.0  15.0  0.0  6.2359  0.0  33.2176  0.0
# 4  5.0  20.0  0.0  6.2387  0.0  33.2175  0.0
Sign up to request clarification or add additional context in comments.

Comments

1

You might split the values by comma, strip the parts and add the resulting array to a DataFrame like follows:

import pandas as pd

data = [[item.strip() for item in line.split(',')] for line in idata]
df = pd.DataFrame(data)

In order to safely convert the DataFrame to numeric values pd.to_numeric could be used:

df = df.apply(pd.to_numeric)

1 Comment

Note that the dtype of the columns would be object. Based on the content of the input, I am not sure that this is a desirable behavior in this case.
0
try: from io import StringIO  # Python 3
except: from StringIO import StringIO  # Python 2

import pandas as pd

df = pd.read_csv(StringIO(''.join(idata)), index_col=0, header=None, sep=r',\s*', engine='python')

print(df)

# prints:
#       1   2   3       4   5   6        7   8   9  10
# 0                                                   
# 1   0.0   0 NaN  6.2386   0 NaN  33.2166   0 NaN NaN
# 2   5.0   0 NaN  6.2385   0 NaN  33.2166   0 NaN NaN
# 3  10.0   0 NaN  6.2306   0 NaN  33.2175   0 NaN NaN
# 4  15.0   0 NaN  6.2359   0 NaN  33.2176   0 NaN NaN
# 5  20.0   0 NaN  6.2387   0 NaN  33.2175   0 NaN NaN

Remove the header=None if you can include an initial row of idata that actually specifies helpful column labels. Remove sep=r',\s*', engine='python' if you're happy for the blank columns to contain blank string objects instead of NaN.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.