5

I am trying to create a csv from values stored in the table:

 | col1   | col2   | col3  |
 | "one"  | null   | "one" |
 | "two"  | "two"  | "two" |

hive > select * from table where col2 is null;
 one   null    one 

I am getting the csv using the below code:

df.repartition(1)
  .write.option("header",true)
  .option("delimiter", ",")
  .option("quoteAll", true)
  .option("nullValue", "")
  .csv(S3Destination)

Csv I get:

"col1","col2","col3"
"one","","one"
"two","two","two"

Expected Csv:WITH NO DOUBLE QUOTES FOR NULL VALUE

"col1","col2","col3"
"one",,"one"
"two","two","two"

Any help is appreciated to know if the dataframe writer has options to do this.

1 Answer 1

1

You can go in a udf approach and apply on the column (using withColumn on the repartitioned datafrmae above) where possiblity of double quote empty string is there see below sample code

 sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);

String has replace method which does the job.

val a =  Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))

will produce 'x',,z

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your help. I appreciate it. I'm looking for a way where the dataframe writer has options not to add doublequotes to null values since I did not want to manipulate the csv.
sorry AFAIK I'm not aware of such option in built. above approach should work
if you are okay please accept as owner. it will be pointer to other users as well. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.