Create dataframe with header using header and data file

Question

I have two files data.csv and headers.csv. I want to create dataframe in Spark/Scala with headers.

var data = spark.sqlContext.read.format(
  "com.databricks.spark.csv").option("header", "true"
).option("inferSchema", "true").load(data_path)

Can you help me customizing above lines to do this?

you read the headers.csv using header option and create schema from that and use the schema to the data.csv. — Anahcolus
– Anahcolus, Commented Oct 26, 2017 at 3:25

Anahcolus · Accepted Answer · 2017-10-26 03:30:09Z

5

you can read the headers.csv by using the above method and use the schema of headers dataframe to read the data.csv as below

val headersDF = sqlContext
  .read
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .load("path to headers.csv")

val schema = headersDF.schema

val dataDF = sqlContext
  .read
  .format("com.databricks.spark.csv")
  .schema(schema)
  .load("path to data.csv")

I hope the answer is helpful

answered Oct 26, 2017 at 3:30

Anahcolus

42.1k6 gold badges75 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Anahcolus Over a year ago

great to hear that :) you can accept the answer too :)

desertnaut Over a year ago

@Ravikrn providing answers takes valuable time for the respondents; since the proposed solution worked, please show some courtesy and accept the answer.

Collectives™ on Stack Overflow

Create dataframe with header using header and data file

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related