1

I have some csv files including array columns. For example:

a,b,c
1,1|2|3,4.5|5.5|6.5
2,7|8|9,10.5|11.5|12.5

Delimter 1 is , to sepperate fields a, b and c. Delimiter 2 is | in this case, but could be changed.

Is there a possibility in python to read this as a pandas dataframe directly? Field b and c should be an array/series inside the dataframe.

What I do now is reading the csv as strings:

df = pd.read_csv('data.csv', dtype='str')

Then use np.fromstring to convert all strings to numpy arrays:

type_dict = {
  "a": "int",
  "b": "int",
  "c": "float"
}

def make_split(text, dt):
    return np.fromstring(text, sep="|", dtype=dt)

df = df.apply(lambda x: x.apply(make_split, dt=type_dict[x.name]))

But this takes several minutes for my files. Is there a faster option?

2 Answers 2

4

You can use converters parameter of .read_csv() to parse columns:

import pandas as pd
import numpy as np


df = pd.read_csv('data.csv', converters={
    'b': lambda x: np.array(x.split('|'), dtype='int'),
    'c': lambda x: np.array(x.split('|'), dtype='float')
    })
print(df)

Output:

   a          b                   c
0  1  [1, 2, 3]     [4.5, 5.5, 6.5]
1  2  [7, 8, 9]  [10.5, 11.5, 12.5]
Sign up to request clarification or add additional context in comments.

2 Comments

You might want to define a named function and use that in the dictionary
This is slightly faster than my version, thank you!
1
df=pd.read_csv(r'csv2.txt', sep=',')


df['b']=df['b'].str.split('|').apply(lambda x: [int(i) for i in x])
df['c']=df['c'].str.split('|').apply(lambda x: [float(i) for i in x])
df
a   b   c
0   1   [1, 2, 3]   [4.5, 5.5, 6.5]
1   2   [7, 8, 9]   [10.5, 11.5, 12.5]

2 Comments

This way I still get strings which aren't numpy lists. The conversion to the correct types takes the most time.
I updated it to use a numpy array and work for a generic dataframe: df = df.apply(lambda x: x.str.split("|").apply(lambda y: np.array(y, dtype=type_dict[x.name]))) This is the fastest solution at the moment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.