Dataframe TypeError cannot accept object

Question

I have list of string in python as follows :

['start_column=column123;to_3=2020-09-07 10:29:24;to_1=2020-09-07 10:31:08;to_0=2020-09-07 10:31:13;',
'start_column=column475;to_3=2020-09-07 10:29:34;']

I am trying to convert it into dataframe in following way :

schema = StructType([
    StructField('Rows', ArrayType(StringType()), True)
])

rdd = sc.parallelize(test_list)
query_data = spark.createDataFrame(rdd,schema)
print(query_data.schema)
query_data.show()

I am getting following error:

TypeError: StructType can not accept object

actually keywords can become column name and value of it correspondingly will be best.Something like this : stackoverflow.com/questions/47552045/… — Arshanvit
– Arshanvit, Commented Nov 5, 2020 at 14:54

dsk · Accepted Answer · 2020-11-05 14:46:15Z

You just need to pass that as a list while creating the dataframe as below ...

a_list = ['start_column=column123;to_3=2020-09-07 10:29:24;to_1=2020-09-07 10:31:08;to_0=2020-09-07 10:31:13;',
'start_column=column475;to_3=2020-09-07 10:29:34;']
sparkdf = spark.createDataFrame([a_list],["col1", "col2"])
sparkdf.show(truncate=False)

+--------------------------------------------------------------------------------------------------+------------------------------------------------+
|col1                                                                                              |col2                                            |
+--------------------------------------------------------------------------------------------------+------------------------------------------------+
|start_column=column123;to_3=2020-09-07 10:29:24;to_1=2020-09-07 10:31:08;to_0=2020-09-07 10:31:13;|start_column=column475;to_3=2020-09-07 10:29:34;|
+--------------------------------------------------------------------------------------------------+------------------------------------------------+

mck · Accepted Answer · 2020-11-05 13:55:52Z

0

You should use schema = StringType() because your rows contains strings rather than structs of strings.

answered Nov 5, 2020 at 13:55

mck

42.7k13 gold badges44 silver badges62 bronze badges

Comments

Dharman · Accepted Answer · 2020-11-05 19:05:32Z

I have two possible solutions for you.

SOLUTION 1: Assuming you wanted a dataframe with just one row

I was able to make it work by wrapping the values in test_list in Parentheses and using StringType.

v = [('start_column=column123;to_3=2020-09-07 10:29:24;to_1=2020-09-07 10:31:08;to_0=2020-09-07 10:31:13;',
'start_column=column475;to_3=2020-09-07 10:29:34;')]


schema = StructType([
    StructField('col_1', StringType(), True), 
    StructField('col_2', StringType(), True), 

])

rdd = sc.parallelize(v)
query_data = spark.createDataFrame(rdd,schema)
print(query_data.schema)
query_data.show(truncate = False)

SOLUTION 2: Assuming you wanted a dataframe with just one column

v = ['start_column=column123;to_3=2020-09-07 10:29:24;to_1=2020-09-07 10:31:08;to_0=2020-09-07 10:31:13;',
'start_column=column475;to_3=2020-09-07 10:29:34;']


from pyspark.sql.types import StringType

df = spark.createDataFrame(v, StringType())

df.show(truncate = False)

Collectives™ on Stack Overflow

Dataframe TypeError cannot accept object

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related