Python Spark- How to output empty DataFrame to csv file (Only output header)?

Question

I want to output empty dataframe to csv file. I use these codes:

df.repartition(1).write.csv(path, sep='\t', header=True)

But due to there is no data in dataframe, spark won't output header to csv file. Then I modify the codes to:

if df.count() == 0:
    empty_data = [f.name for f in df.schema.fields]
    df = ss.createDataFrame([empty_data], df.schema)
    df.repartition(1).write.csv(path, sep='\t')
else:
    df.repartition(1).write.csv(path, sep='\t', header=True)

It works, but I want to ask whether there are a better way without count function.

Not sure why df.schema is being passed to createDataFrame. If you have anything other than strings in your schema the method call will break. — Nylon Smile
– Nylon Smile, Commented Nov 13, 2019 at 13:27

TMichel · Accepted Answer · 2017-11-29 11:16:23Z

2

df.count() == 0 will make your driver program retrieve the count of all your dataframe partitions across the executors.

In your case I would use df.take(1).isEmpty (Spark >= 2.1). Still slow, but preferable to a raw count().

answered Nov 29, 2017 at 11:16

TMichel

4,49211 gold badges47 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Zhang Tong · Accepted Answer · 2017-11-29 03:34:37Z

1

Only header:

cols = '\t'.join(df.columns)
with open('./cols.csv', 'w') as f:
    f.write(cols)

answered Nov 29, 2017 at 3:34

Zhang Tong

4,7593 gold badges21 silver badges39 bronze badges

1 Comment

Well Over a year ago

The file may be not in local system. I use Azure HDInsight and blob storage.

Collectives™ on Stack Overflow

Python Spark- How to output empty DataFrame to csv file (Only output header)?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related