4

I have a csv file with 3 columns emotion, pixels, Usage consisting of 35000 rows e.g. 0,70 23 45 178 455,Training.

I used pandas.read_csv to read the csv file as pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str}).

When I try the above, it says ValueError: invalid literal for long() with base 10: '70 23 45 178 455'? How do i read the pixels columns as a numpy array?

3 Answers 3

12

Please try the below code instead -

df = pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':str, 'Usage':str})

def makeArray(text):
    return np.fromstring(text,sep=' ')

df['pixels'] = df['pixels'].apply(makeArray)
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, thanks for your help. It now says TypeError: data type not understood. What could be the probable mistake?
2

It will be faster I believe to use the vectorised str method to split the string and create the new pixel columns as desired and concat the new columns to the new df:

In [175]:
# load the data
import pandas as pd
import io
t="""emotion,pixels,Usage
0,70 23 45 178 455,Training"""
df = pd.read_csv(io.StringIO(t))
df

Out[175]:
   emotion            pixels     Usage
0        0  70 23 45 178 455  Training

In [177]:
# now split the string and concat column-wise with the orig df
df = pd.concat([df, df['pixels'].str.split(expand=True).astype(int)], axis=1)
df
Out[177]:
   emotion            pixels     Usage   0   1   2    3    4
0        0  70 23 45 178 455  Training  70  23  45  178  455

If you specifically want a flat np array you can just call the .values attribute:

In [181]:
df['pixels'].str.split(expand=True).astype(int).values

Out[181]:
array([[ 70,  23,  45, 178, 455]])

Comments

1

I encountered the same problem and figured out a hack. Save your datafrae as a .npy file. While loading it, it will be loaded as an ndarray. You can the use pandas.DataFrame to convert the ndarray to a dataframe for your use. I found this solution to be easier than converting from string fields. Sample code below:

import numpy as np
import pandas as pd
np.save('file_name.npy',dataframe_to_be_saved)
#the dataframe is saved in 'file_name.npy' in your current working directory

#loading the saved file into an ndarray
arr=np.load('file_name.npy')
df=pd.DataFrame(data=arr[:,1:],index=arr[:,0],columns=column_names)

#df variable now stores your dataframe with the original datatypes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.