2

I want to take headers (column name) from my csv file and the want to match with it my existing header. I am using below code:

val cc = sparksession.read.csv(filepath).take(1)

Its giving me value like:

Array([id,name,salary]) 

and I have created one more static schema, which is giving me value like this:

val ss=Array("id","name","salary")

and then I'm trying to compare column name using if condition:

if(cc==ss){
  println("matched")
} else{
  println("not matched")
}

I guess due to [] and () mismatch its always going to else part is there any other way to compare these value without considering [] and ()?

2
  • Try with a.deep == b.deep for a deep comparison Commented Apr 22, 2020 at 8:04
  • Do you only want to compare column names or their values as well? Commented Apr 22, 2020 at 10:25

3 Answers 3

4

First, for convenience, set the header option to true when reading the file:

val df = sparksession.read.option("header", true).csv(filepath)

Get the column names and define the expected column names:

val cc = df.columns
val ss = Array("id", "name", "salary")

To check if the two match (not considering the ordering):

if (cc.toSet == ss.toSet) {
  println("matched")
} else {
  println("not matched")
}

If the order is relevant, then the condition can be done as follows (you can't use Array here but Seq works):

cc.toSeq == ss.toSeq

or you a deep array comparison:

cc.deep == d.deep
Sign up to request clarification or add additional context in comments.

Comments

0

First of all, I think you are trying to compare a Array[org.apache.spark.sql.Row] with an Array[String]. I believe you should change how you load the headers to something like: val cc = spark.read.format("csv").option("header", "true").load(fileName).columns.toArray. Then you could compare using cc.deep == ss.deep.

1 Comment

val cc= spark.read.csv("filepath").take(1)(0).toString and then cretaed ss as ("","")after that applied if condition. Will try your solution also. Thanks!!
0

Below code worked for me.

val cc= spark.read.csv("filepath").take(1)(0).toString

The above code gave output as String:[id,name,salary].

created one one stating schema as

val ss="[id,name,salary]"

then wrote the if else Conditions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.