0

The following code reads csv into a dataframe in scala:

 val mDF: DataFrame = spark.read.csv("src/test/resources/knimeMerged.csv")

However, it treats the first row of the imported data as a data row. In fact, the first row is headers. It uses the default headers for dataframe as headers (e.g., _c0, _c1)

I assume there is an Option to allow the import of headers for a csv file but cannot find it in the Scala API docs (I'm new to scala and their documentation).

Any hints would be appreciated both on what the option is and how to implement

2 Answers 2

3

The option to handle it is header; set header as true will work:

val mDF: DataFrame = spark.read.option("header", true).csv("src/test/resources/knimeMerged.csv")
Sign up to request clarification or add additional context in comments.

Comments

1

You can add the option header before using csv method with value as true Something Just like this.

val df = spark.read.option("header","true").option("inferSchema","true").csv("src/test/resources/knimeMerged.csv")

I have also added the new option named inferSchema.

Using inferSchema as an option let spark try to specify column type. spark we try to infer the schema i.e. some column has data type as Int then it will add this information to CSV's Schema.

Using both the options you will have better metadata about the CSV file.

1 Comment

which is just what I have been figuring out. Thanks a million

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.