0

So my generated dataframe df looks like this:

+---------------------------------------------------------------------+-----------------+
|constraint_message                                                   |constraint_status|
+---------------------------------------------------------------------+-----------------+
|                                                                     |Success          |
|Value: 8.109213053982745E-6 does not meet the constraint requirement!|Failure          |
|                                                                     |Success          |
|                                                                     |Success          |
|Value: 0.98 does not meet the constraint requirement!                |Failure          |
+---------------------------------------------------------------------+-----------------+

I want to have a new column in this dataframe, the logic for which I've defined in the function:

def metric = (status: String, valu:Double) => {
  if (status == "Success"){ 1 }  
  else{ valu } 
}
val putMetric = spark.udf.register("Metric",metric)

Now when I'm calling it like this: [Note: I'll later replace the 0 by a Double variable]

df.withColumn("Metric",putMetric(col("constraint_status"),0)).show()

I get the error:

try.scala:48: error: type mismatch;
 found   : Int(0)
 required: org.apache.spark.sql.Column
    df.withColumn("Metric",putMetric(col("constraint_status"),0))

How to rectify this? I tried putting col(0) but that didn't work either

2 Answers 2

1

Regex adapted from this answer:

val df2 = df.withColumn(
    "newcol",
    when(
        col("constraint_message").isNull || length(col("constraint_message")) === 0,
        lit(1)
    )
    .otherwise(
        regexp_extract(
            col("constraint_message"),
            raw"(\d+(\.\d+)?(E[+-]\d+)?)",
            1
        )
    )
    .cast("double")
)

df2.show(false)
+---------------------------------------------------------------------+-----------------+--------------------+
|constraint_message                                                   |constraint_status|newcol              |
+---------------------------------------------------------------------+-----------------+--------------------+
|null                                                                 |Success          |1.0                 |
|Value: 8.109213053982745E-6 does not meet the constraint requirement!|Failure          |8.109213053982745E-6|
|null                                                                 |Success          |1.0                 |
|null                                                                 |Success          |1.0                 |
|Value: 0.98 does not meet the constraint requirement!                |Failure          |0.98                |
+---------------------------------------------------------------------+-----------------+--------------------+
Sign up to request clarification or add additional context in comments.

Comments

1

You can use regexp_extract

val df1 = df.withColumn(
  "passed",
  when(
    col("constraint_status") === "Failure",
    regexp_extract(col("constraint_message"), "Value: (\\d*\\.?\\d+([e|E][+-]?[0-9]+)?).*", 1)
  ).otherwise(1).cast("double")
)

df1.show
//+--------------------+-----------------+--------------------+
//|  constraint_message|constraint_status|              passed|
//+--------------------+-----------------+--------------------+
//|                    |          Success|                   1|
//|Value: 8.10921305...|          Failure|8.109213053982745E-6|
//|                    |          Success|                   1|
//|                    |          Success|                   1|
//|Value: 0.98 does ...|          Failure|                0.98|
//+--------------------+-----------------+--------------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.