2

I have a dataframe with 5 columns - sourceId, score_1, score_3, score_4 and score_7. The values of sourceId column can be [1, 3, 4, 7]. I want to convert this into another dataframe that has the columns sourceId and score, where score depends on the value of the sourceId column.

sourceId score_1 score_3 score_4 score_7
1 0.3 0.7 0.45 0.21
4 0.15 0.66 0.73 0.47
7 0.34 0.41 0.78 0.16
3 0.77 0.1 0.93 0.67

So if sourceId = 1, we select value of score_1 for that record, if sourceId = 3, we select value of score_3, and so on...

Result would be

sourceId score
1 0.3
4 0.73
7 0.16
3 0.1

What would be the best way to do this in Spark?

2 Answers 2

2

Chaining multiple when expressions on id column values:

val ids = Seq(1, 3, 4, 7)

val scoreCol = ids.foldLeft(lit(null)) { case (acc, id) =>
  when(col("sourceId")===id, col(s"score_$id")).otherwise(acc)
}

val df2 = df.withColumn("score", scoreCol)

Or building a map expression from score_* columns and use it to get score values:

val scoreMap = map(
  df.columns
    .filter(_.startsWith("score_"))
    .flatMap(c => Seq(lit(c.split("_")(1)), col(c))): _*
)

val df2 = df.withColumn("score", scoreMap(col("sourceId")))
Sign up to request clarification or add additional context in comments.

Comments

0

Another way of doing it is to create a dynamic when condition:

ArrayList<String> scoresCols = {"score_1", "score_2", ...};
Column actualScoreCol = when(col("sourceId")
  .equalTo(scoresCols.get(0)), col(scoresCols.get(0))
  .cast("string")); // Can add "score_" as suffix and then compare

for (int i = 1; i < scoresCols.size(); i++) {
  actualScoreCol = actualScoreCol
    .when(col("sourceId")
    .equalTo(scoresCols.get(i)), col(scoresCols.get(i))
    .cast("string"));
}
ds = joinedDataset.withColumn("actual", actualCol);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.