I want to normalize Names of authors by removing the accents
Input: orčpžsíáýd
Output: orcpzsiayd
The code below will allow me the achieve this. How ever I am not sure how i can do this using spark functions where my input is dataframe col.
def stringNormalizer(c : Column) = (
import org.apache.commons.lang.StringUtils
return StringUtils.stripAccents(c.toString)
)
The way i should be able to call it
val normalizedAuthor = flat_author.withColumn("NormalizedAuthor",
stringNormalizer(df_article("authors")))
I have just started learning spark. So please let me know if there is a better way to achieve this without UDFs.