I am trying to generate hash code for dataframe using hashlib.md5 in pyspark. It only accepts a string to generate hash code.
I need to convert each row of a dataframe to string.
I tried concat_ws function to concatenate all columns and make it as a string but no result.
My dataframe has columns of Id, name, marks
I tried:
str=df.select(concat_ws("id","name","marks"))
print(hashlib.md5(str.encode(encoding='utf_8', errors='strict')).hexdigest())
I got this error:
AttributeError: 'DataFrame' object has no attribute 'encode'
md5Spark standard function?