Read a csv with numpy array using pandas

Question

I have a csv file with 3 columns emotion, pixels, Usage consisting of 35000 rows e.g. 0,70 23 45 178 455,Training.

I used pandas.read_csv to read the csv file as pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str}).

When I try the above, it says ValueError: invalid literal for long() with base 10: '70 23 45 178 455'? How do i read the pixels columns as a numpy array?

Anand S Kumar · Accepted Answer · 2015-06-19 07:20:52Z

12

Please try the below code instead -

df = pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':str, 'Usage':str})

def makeArray(text):
    return np.fromstring(text,sep=' ')

df['pixels'] = df['pixels'].apply(makeArray)

edited Jun 19, 2015 at 7:20

answered Jun 19, 2015 at 5:58

Anand S Kumar

91.4k18 gold badges196 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

VeilEclipse Over a year ago

Hi, thanks for your help. It now says TypeError: data type not understood. What could be the probable mistake?

EdChum · Accepted Answer · 2015-06-19 08:23:04Z

It will be faster I believe to use the vectorised str method to split the string and create the new pixel columns as desired and concat the new columns to the new df:

In [175]:
# load the data
import pandas as pd
import io
t="""emotion,pixels,Usage
0,70 23 45 178 455,Training"""
df = pd.read_csv(io.StringIO(t))
df

Out[175]:
   emotion            pixels     Usage
0        0  70 23 45 178 455  Training

In [177]:
# now split the string and concat column-wise with the orig df
df = pd.concat([df, df['pixels'].str.split(expand=True).astype(int)], axis=1)
df
Out[177]:
   emotion            pixels     Usage   0   1   2    3    4
0        0  70 23 45 178 455  Training  70  23  45  178  455

If you specifically want a flat np array you can just call the .values attribute:

In [181]:
df['pixels'].str.split(expand=True).astype(int).values

Out[181]:
array([[ 70,  23,  45, 178, 455]])

Gabizon · Accepted Answer · 2019-08-27 14:13:35Z

1

I encountered the same problem and figured out a hack. Save your datafrae as a .npy file. While loading it, it will be loaded as an ndarray. You can the use pandas.DataFrame to convert the ndarray to a dataframe for your use. I found this solution to be easier than converting from string fields. Sample code below:

import numpy as np
import pandas as pd
np.save('file_name.npy',dataframe_to_be_saved)
#the dataframe is saved in 'file_name.npy' in your current working directory

#loading the saved file into an ndarray
arr=np.load('file_name.npy')
df=pd.DataFrame(data=arr[:,1:],index=arr[:,0],columns=column_names)

#df variable now stores your dataframe with the original datatypes

edited Aug 27, 2019 at 14:13

Gabizon

3671 gold badge4 silver badges15 bronze badges

answered Jun 22, 2017 at 9:03

Sanchari Dan

112 bronze badges

Collectives™ on Stack Overflow

Read a csv with numpy array using pandas

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related