7

I want to output empty dataframe to csv file. I use these codes:

df.repartition(1).write.csv(path, sep='\t', header=True)

But due to there is no data in dataframe, spark won't output header to csv file. Then I modify the codes to:

if df.count() == 0:
    empty_data = [f.name for f in df.schema.fields]
    df = ss.createDataFrame([empty_data], df.schema)
    df.repartition(1).write.csv(path, sep='\t')
else:
    df.repartition(1).write.csv(path, sep='\t', header=True)

It works, but I want to ask whether there are a better way without count function.

1
  • Not sure why df.schema is being passed to createDataFrame. If you have anything other than strings in your schema the method call will break. Commented Nov 13, 2019 at 13:27

2 Answers 2

2

df.count() == 0 will make your driver program retrieve the count of all your dataframe partitions across the executors.

In your case I would use df.take(1).isEmpty (Spark >= 2.1). Still slow, but preferable to a raw count().

Sign up to request clarification or add additional context in comments.

Comments

1

Only header:

cols = '\t'.join(df.columns)
with open('./cols.csv', 'w') as f:
    f.write(cols)

1 Comment

The file may be not in local system. I use Azure HDInsight and blob storage.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.