getting unusual error when creating a string schema type dataframe

Question

I am creating a simple data frame.

df=spark.createDataFrame(data=[('11s1 ab')],schema=['str'])

I get error:

TypeError: Can not infer schema for type: <class 'str'>

However if I change the statement to :

df=spark.createDataFrame(data=[('11s1 ab',)],schema=['str'])

my dataframe is successfully created.

I want to understand why that comma sign matters in data definition tuple in spark.createdataFrame.

you need to pass a tuple, not string. use data=[('11s1 ab',)] — samkart
– samkart, Commented Nov 30, 2022 at 10:23

Amir Hossein Shahdaei · Accepted Answer · 2022-11-30 10:40:56Z

1

In the document of createDataFrame you can see the data field must be:

data: Union[pyspark.rdd.RDD[Any], Iterable[Any], ForwardRef('PandasDataFrameLike')]

(1,) or [1] are iterable but (1) would be integer type which is not iterable

answered Nov 30, 2022 at 10:40

1,2761 gold badge9 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ah, I get it, to make this answer clearer. (1,) is a tuple, (1) is an integer. Hence it fulfills the iterable requirement.