1

I have a csv file with "" (empty value) and "N/A" and "-" all in the same files. I want them all to be read into the dataframe as nulls. I know that there is an option in spark-csv "nullValue" , which allows me to treat one single string as null. But for me, that is not sufficient for obvious reasons.

There is a pending issue from spark, https://github.com/databricks/spark-csv/issues/333

which is still open. I was wondering about the most elegent way to get around the problem.

7
  • Is it critical that they be "read in" as Nulls or is it acceptable to read them into the dataframe (say as strings) and then convert to Nulls? Commented Nov 7, 2017 at 1:11
  • most elegant solution would be to use a replaceAll and make your data uniform. Commented Nov 7, 2017 at 8:55
  • @combinatorist , I want to read that against a schema and use it as a dataset. So certain fields that are integers by default contains values like "N/A" or "-" all of which I want to be parsed as null to be able to read into the interger field of my schema case class. So I'd prefer to do it when being read from the file into a dataset tself. Commented Nov 7, 2017 at 17:14
  • @philantrovert . I would do it as the last case. But Ideally, I want spark to handle the whole thing rather than a regular in memory replaceAll. Commented Nov 7, 2017 at 17:15
  • @VishnuPrathish, what if you read the field into a dataframe as a string, make Null replacements there, convert the field to an int, and then cast that dataframe as a dataset? Commented Nov 7, 2017 at 17:18

2 Answers 2

3

Reposted from my comment:

  • Read the field into a dataframe as a string
  • make Null replacements there
  • convert the field to an int
  • then cast that dataframe as a dataset
Sign up to request clarification or add additional context in comments.

Comments

0

For those who cant get it to work on databricks community edition notebook, You probably haven't mentioned the filename.

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.