I have a dataframe with 5 columns - sourceId, score_1, score_3, score_4 and score_7. The values of sourceId column can be [1, 3, 4, 7]. I want to convert this into another dataframe that has the columns sourceId and score, where score depends on the value of the sourceId column.
| sourceId | score_1 | score_3 | score_4 | score_7 |
|---|---|---|---|---|
| 1 | 0.3 | 0.7 | 0.45 | 0.21 |
| 4 | 0.15 | 0.66 | 0.73 | 0.47 |
| 7 | 0.34 | 0.41 | 0.78 | 0.16 |
| 3 | 0.77 | 0.1 | 0.93 | 0.67 |
So if sourceId = 1, we select value of score_1 for that record, if sourceId = 3, we select value of score_3, and so on...
Result would be
| sourceId | score |
|---|---|
| 1 | 0.3 |
| 4 | 0.73 |
| 7 | 0.16 |
| 3 | 0.1 |
What would be the best way to do this in Spark?