I have some csv files including array columns. For example:
a,b,c
1,1|2|3,4.5|5.5|6.5
2,7|8|9,10.5|11.5|12.5
Delimter 1 is , to sepperate fields a, b and c. Delimiter 2 is | in this case, but could be changed.
Is there a possibility in python to read this as a pandas dataframe directly? Field b and c should be an array/series inside the dataframe.
What I do now is reading the csv as strings:
df = pd.read_csv('data.csv', dtype='str')
Then use np.fromstring to convert all strings to numpy arrays:
type_dict = {
"a": "int",
"b": "int",
"c": "float"
}
def make_split(text, dt):
return np.fromstring(text, sep="|", dtype=dt)
df = df.apply(lambda x: x.apply(make_split, dt=type_dict[x.name]))
But this takes several minutes for my files. Is there a faster option?