Select a column based on another column's value in Spark Dataframe using Scala

Question

I have a dataframe with 5 columns - sourceId, score_1, score_3, score_4 and score_7. The values of sourceId column can be [1, 3, 4, 7]. I want to convert this into another dataframe that has the columns sourceId and score, where score depends on the value of the sourceId column.

sourceId	score_1	score_3	score_4	score_7
1	0.3	0.7	0.45	0.21
4	0.15	0.66	0.73	0.47
7	0.34	0.41	0.78	0.16
3	0.77	0.1	0.93	0.67

So if sourceId = 1, we select value of score_1 for that record, if sourceId = 3, we select value of score_3, and so on...

Result would be

sourceId	score
1	0.3
4	0.73
7	0.16
3	0.1

What would be the best way to do this in Spark?

blackbishop · Accepted Answer · 2022-02-03 08:08:06Z

2

Chaining multiple when expressions on id column values:

val ids = Seq(1, 3, 4, 7)

val scoreCol = ids.foldLeft(lit(null)) { case (acc, id) =>
  when(col("sourceId")===id, col(s"score_$id")).otherwise(acc)
}

val df2 = df.withColumn("score", scoreCol)

Or building a map expression from score_* columns and use it to get score values:

val scoreMap = map(
  df.columns
    .filter(_.startsWith("score_"))
    .flatMap(c => Seq(lit(c.split("_")(1)), col(c))): _*
)

val df2 = df.withColumn("score", scoreMap(col("sourceId")))

answered Feb 3, 2022 at 8:08

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

myyk · Accepted Answer · 2022-12-08 11:15:16Z

0

Another way of doing it is to create a dynamic when condition:

ArrayList<String> scoresCols = {"score_1", "score_2", ...};
Column actualScoreCol = when(col("sourceId")
  .equalTo(scoresCols.get(0)), col(scoresCols.get(0))
  .cast("string")); // Can add "score_" as suffix and then compare

for (int i = 1; i < scoresCols.size(); i++) {
  actualScoreCol = actualScoreCol
    .when(col("sourceId")
    .equalTo(scoresCols.get(i)), col(scoresCols.get(i))
    .cast("string"));
}
ds = joinedDataset.withColumn("actual", actualCol);

edited Dec 8, 2022 at 11:15

myyk

1,5671 gold badge15 silver badges35 bronze badges

answered Nov 28, 2022 at 14:58

Matan Rabi

1

Collectives™ on Stack Overflow

Select a column based on another column's value in Spark Dataframe using Scala

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related