0

I have two files data.csv and headers.csv. I want to create dataframe in Spark/Scala with headers.

var data = spark.sqlContext.read.format(
  "com.databricks.spark.csv").option("header", "true"
).option("inferSchema", "true").load(data_path) 

Can you help me customizing above lines to do this?

1
  • you read the headers.csv using header option and create schema from that and use the schema to the data.csv. Commented Oct 26, 2017 at 3:25

1 Answer 1

5

you can read the headers.csv by using the above method and use the schema of headers dataframe to read the data.csv as below

val headersDF = sqlContext
  .read
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .load("path to headers.csv")

val schema = headersDF.schema

val dataDF = sqlContext
  .read
  .format("com.databricks.spark.csv")
  .schema(schema)
  .load("path to data.csv")

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

2 Comments

great to hear that :) you can accept the answer too :)
@Ravikrn providing answers takes valuable time for the respondents; since the proposed solution worked, please show some courtesy and accept the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.