2

I am trying to get the column name from one column and pass it as a parameter to udf. for eg. I have a DataFrame:

 | name   | array_column       | column4 | column5 |
 |--------|--------------------|---------|---------|
 | first  | column4,column5    |   V1    |    V2   |
 | test   | column4,column5    |   V1    |    V2   |
 | choose | column3,column5    |   V1    |    V2   |

df.withcolumn("test",udf(array_column(0),arraycolumn(1)))

where array_column(0) and array_column(1) which are column4 and column5 respectively represents 2 column names in the dataframe.

I want to bascially do udf(column4,column5) but i need to get array_column values and pass them as a paramter of my udf

I tried setting it, but for some reason the column is not called properly. It is called as String instead of the elements of the array

7
  • What do you mean by calculate another column ? What is your exact requirement ? Could you add the code that doesn't work ? thanks Commented Nov 8, 2019 at 13:10
  • My requirement is basically to calculcate some metrics. My dataframe already has the following columns for eg. col1,col2,col3,col4,metricscol . The metrics col will have for eg "col2,col3" . I have to use the metriccol to identify which col i need to use to calculate my metrics. I have to take the array in metric col and use it like dataframe.select(metriccol(0)) which should give me values in dataframe.col2 Commented Nov 8, 2019 at 13:13
  • Dataframe($"MetricsCol"(0))) should behave as Dataframe("column2") in the comment example Commented Nov 8, 2019 at 13:19
  • sorry I am trying to understand your issue, but it's not clear for me. could you edit your question, and make the difference between column names, the values in the column and the expected result. thanks Commented Nov 8, 2019 at 13:25
  • What I understand is: you have a dataframe, that contains 5 columns: col1, col2, col3, col4 and metrics. metrics is of type array of size 2, and has as value [col4, col5]. Commented Nov 8, 2019 at 13:28

1 Answer 1

1

You can try this code:

Start by creating 2 case classes to manipulate your dataframes:

case class ResultArray(metric1: Double, metric2: Double, metric3: Double, metric4: Double, metricName: String, opportunityMetricsCol: Array[String])

case class ExpectedResult(value: String)

Then, you can extract the expected columns as the following:

val resultArray = Seq(ResultArray(0.55, 0.66012, 164.8204, 4.5,"MetricCalc1", Array("metric1","metric2")),
      ResultArray(0.55, 0.66012, 164.8204, 4.5,"MetricCalc1", Array("metric3","metric4")))

+-------+-------+--------+-------+-----------+---------------------+
|metric1|metric2|metric3 |metric4|metricName |opportunityMetricsCol|
+-------+-------+--------+-------+-----------+---------------------+
|0.55   |0.66012|164.8204|4.5    |MetricCalc1|[metric1, metric2]   |
|0.55   |0.66012|164.8204|4.5    |MetricCalc1|[metric3, metric4]   |
+-------+-------+--------+-------+-----------+---------------------+

    val resultArrayDF = resultArray.toDF

    val expectedResult: Dataset[ExpectedResult] = resultArrayDF.map{ value =>
      val opportunityMetricsCol: util.List[String] = value.getList(5)

      ExpectedResult(opportunityMetricsCol.get(0))

    }

    resultArrayDF.select(expectedResult.first().value).show(false)

    expectedResult.show(false)

+-------+
|metric1|
+-------+
|0.55   |
|0.55   |
+-------+

Hope this helps

Sign up to request clarification or add additional context in comments.

7 Comments

Hi Driss. I hav tried this. but i keep getting issues when i try to do df.select. PFB the code i am using. scala> val expectedResult: Dataset[ExpectedResult] = outputFinalDf.map{ value => val opportunityMetricsCol: java.util.List[String] = value.getList(5) | ExpectedResult(opportunityMetricsCol.get(0))} expectedResult: org.apache.spark.sql.Dataset[ExpectedResult] = [value: string] scala> outputFinalDf.select(expectedResult.first().value).show(false) what is value in expected result set.?
The value expected is the content of metric1 column
It is saying cannot convert a string to a seq. Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to scala.collection.Seq
Could you update your question, by adding the used code and the stacktrace please ?
So may be we can split this one into many objects, you can assign expectedResult.first().value to a val. And then, do the select in another assignment
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.