0

I have a DataFrame df is the result of some pre-processing. The size of df is around 10,000 rows. I save this DataFrame in CSV as follows: df.coalesce(1).write.option("sep",";").option("header","true").csv("output/path")

Now I want to save this DataFrame as txt file in which is row is a JSON string. So, the column names should be passed to attributes in JSON strings.

For example:

df =
  col1   col2   col3
  aa     34     55
  bb     13     77

json_txt =
{"col1": "aa", "col2": "34", "col3": "55"}
{"col1": "bb", "col2": "13", "col3": "77"}

Which is the best way to do it?

6
  • 1
    you can just use df.write.json(path to output) Commented Jan 26, 2018 at 17:05
  • see this instead of .toDF() use .createDataFrame() Commented Jan 26, 2018 at 17:06
  • @RameshMaharjan: Will it write each row as I showed? Commented Jan 26, 2018 at 17:07
  • yes of course . try it , test it and if failed then let me know Commented Jan 26, 2018 at 17:08
  • @RameshMaharjan: Let me test it to check if I get what I want simply using df.coalesce(1).write.json("path") Commented Jan 26, 2018 at 18:17

1 Answer 1

1

You can use write.json api to save a dataframe in json format as

df.coalesce(1).write.json("output path of json file")

Above code would create a json file. But if you want a text format (json text) then you can use toJSON api as

df.toJSON.rdd.coalesce(1).saveAsTextFile("output path to text file")

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.