0

Following dataframe belongs to me

+-------------------------------+-------------------------+
|value                          |feeling                  |
+-------------------------------+-------------------------+
|Sam got these marks            |[sad, sad, dissappointed ]|
|I got good marks               |[happy, excited, happy]   |
+-------------------------------+-------------------------+

I want to iterate through this dataframe and get the array of marks column per each row and use the marks array for some calculation method.

def calculationMethod(arrayValue : Array[String]) {
//get averege of words
}

output dataframe

  +-------------------------------+-----------------------------+--------------
    |value                          |feeling                   |average       |
    +-------------------------------+-----------------------------------------+
    |Sam got these marks            |[sad, sad, dissappointed ]|sad           |
    |I got good marks               |[happy, excited, happy]   |happy         |
    +-------------------------------+-----------------------------------------+

I am not sure how I can iterate through each row and get the array in the second column that can be passed into my written method. Also please note that the dataframe shown in the question is a stream dataframe.

EDIT 1

val calculateUDF = udf(calculationMethod _)
    val editedDataFrame = filteredDataFrame.withColumn("average", calculateUDF(col("feeling"))) 

def calculationMethod(emojiArray: Seq[String]) : DataFrame {
val existingSparkSession = SparkSession.builder().getOrCreate()
    import existingSparkSession.implicits._
    val df = emojiArray.toDF("feeling")
    val result = df.selectExpr(
      "feeling",
      "'U+' || trim('0' , string(hex(encode(feeling, 'utf-32')))) as unicode"
    )
    result
}

I'm getting the following error

Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported

Please note that initial dataframe mentioned in the question is a stream dataframe

EDIT 2

This should be the final dataframe that I am expecting

    +-------------------+--------------+-------------------------+
    |value              |feeling       |unicode                  |
    +-------------------+--------------+-------------------------+
    |Sam got these marks|[😀😆😁]     |[U+1F600 U+1F606 U+1F601]|
    |I got good marks   |[😄🙃]        | [U+1F604 U+1F643 ]      |
    +-------------------+---------------+-------------------------+

1 Answer 1

1

You can transform the arrays instead of using a UDF:

val df2 = df.withColumn(
    "unicode", 
    expr("transform(feeling, x -> 'U+' || ltrim('0' , string(hex(encode(x, 'utf-32')))))")
)

df2.show(false)
+-------------------+------------+---------------------------+
|value              |feeling     |unicode                    |
+-------------------+------------+---------------------------+
|Sam got these marks|[😀, 😆, 😁]|[U+1F600, U+1F606, U+1F601]|
|I got good marks   |[😄, 🙃]    |[U+1F604, U+1F643]         |
+-------------------+------------+---------------------------+
Sign up to request clarification or add additional context in comments.

8 Comments

im getting the error "Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported"
@tharindu it's not possible to call methods on a spark dataframe inside a UDF. You can consider using transform on the array. If you could provide an example dataframe I can give it a try.
I have provided the expected dataframe under the edit 2
I am not getting any output. No error is thrown but there is nothing printed in the console.
Sorry there was some network issue thats why. Its working properly. thanks a lot for the help!!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.