1

I am creating a simple data frame.

df=spark.createDataFrame(data=[('11s1 ab')],schema=['str'])

I get error:

TypeError: Can not infer schema for type: <class 'str'>

However if I change the statement to :

df=spark.createDataFrame(data=[('11s1 ab',)],schema=['str'])

my dataframe is successfully created.

I want to understand why that comma sign matters in data definition tuple in spark.createdataFrame.

1
  • 1
    you need to pass a tuple, not string. use data=[('11s1 ab',)] Commented Nov 30, 2022 at 10:23

1 Answer 1

1

In the document of createDataFrame you can see the data field must be:

data: Union[pyspark.rdd.RDD[Any], Iterable[Any], ForwardRef('PandasDataFrameLike')]

(1,) or [1] are iterable but (1) would be integer type which is not iterable

Sign up to request clarification or add additional context in comments.

1 Comment

Ah, I get it, to make this answer clearer. (1,) is a tuple, (1) is an integer. Hence it fulfills the iterable requirement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.