1

I need to create a new column called hash_id from uid column of my dataframe, Below is my code:

//1.Define a hashing function
def calculate_hashid (uid: String) : BigInteger ={

      val md = java.security.MessageDigest.getInstance("SHA-1")
      val ha = new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
      return ha

    }

//2.Convert function to UDF
val  calculate_hashidUDF = udf(calculate_hashid)

//3.Apply udf on spark dataframe
val userAgg_Data_hashid = userAgg_Data.withColumn("hash_id", calculate_hashidUDF($"uid"))

I am getting error at udf(calculate_hashid) saying

missing arguments for the method calculate_hashid(string)

I have gone through many examples online but could not resolve it, what am I missing here.

1 Answer 1

2

You can register your udf as

val  calculate_hashidUDF = udf[String, BigInteger](calculate_hashidUDF)

You can also rewrite your udf as

def calculate_hashidUDF = udf(((uid: String) => {
  val md = java.security.MessageDigest.getInstance("SHA-1")
  new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
}): String => BigInteger)

Or even without return type

def calculate_hashidUDF = udf((uid: String) => {
  val md = java.security.MessageDigest.getInstance("SHA-1")
  new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.