0

With pandas read_csv() function I read an iso-8859-1 file as follows:

df = pd.read_csv('path/file', \
                   sep = '|',names =['A','B'], encoding='iso-8859-1')

Then, I would like to use MLLib's word2vect. However, it only accepts as a parameter RDDs. So I tried to transform the pandas dataframe to an RDD as follows:

from pyspark.sql import SQLContext
spDF = sqlContext.createDataFrame(df['A'])
spDF.show()

Anyhow, I got the following exception:

TypeError: Can not infer schema for type: <type 'unicode'>

I went to Pyspark's documentation in order to see if there is something like an encoding parameter, but I did not found anything. Any idea of how to transform an specific pandas dataframe column to a Pyspark RDD?.

update:

From @zeros answer this is what I tried save the columnn as a dataframe, like this:

new_dataframe = df_3.loc[:,'A']
new_dataframe.head()

Then:

from pyspark.sql import SQLContext
spDF = sqlContext.createDataFrame(new_dataframe)
spDF.show()

And I got the same exception:

TypeError: Can not infer schema for type: <type 'unicode'>
0

2 Answers 2

2

When you use df['A'] is not a pandas.DataFrame but pandas.Series hence when you pass it to SqlContext.createDataFrame it is treated as any other Iterable and PySpark doesn't support conversion of simple types to DataFrame.

If you want to keep data as Pandas DataFrame use loc method:

df.loc[:,'A']
Sign up to request clarification or add additional context in comments.

Comments

0

From @zeros323 answer I noted that it actually was not a pandas dataframe. I consulted pandas documentation and found that to_frame() can convert that specific column in a pandas dataframe. So I did the following:

new_dataframe = df['A'].to_frame()
new_dataframe.head()
from pyspark.sql import SQLContext
spDF = sqlContext.createDataFrame(new_dataframe)
spDF.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.