How to add new columns to a dataframe in a loop using scala on Azure Databricks

Question

I have imported a csv file into a dataframe in Azure Databricks using scala.

--------------
A  B  C  D  E
--------------
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
--------------

Now I want to perform hash on some selective columns and add the result as a new column to that dataframe.

--------------------------------
A  B  B2       C  D  D2       E
--------------------------------
a1 b1 hash(b1) c1 d1 hash(d1) e1
a2 b2 hash(b2) c2 d2 hash(d2) e2
--------------------------------

This is the code I have:

val data_df = spark.read.format("csv").option("header", "true").option("sep", ",").load(input_file)
...
...
for (col <- columns) {
    if (columnMapping.keys.contains((col))){
        val newColName = col + "_token"
        // Now here I want to add a new column to data_df and the content would be hash of the current value
    }
}
// And here I would like to upload selective columns (B, B2, D, D2) to a SQL database

Any help will be highly appreciated. Thank you!

df.withColumn("newColName", hash("col")) try this.

Daeho Ro
– Daeho Ro

2019-08-01 21:56:46 +00:00
Commented Aug 1, 2019 at 21:56 — Daeho Ro
– Daeho Ro, Commented Aug 1, 2019 at 21:56

a9207 · Accepted Answer · 2019-08-02 08:53:55Z

1

Try this -

val colsToApplyHash = Array("B","D")

val hashFunction:String => String = <ACTUAL HASH LOGIC>
val hash = udf(hashFunction)

val finalDf = colsToApplyHash.foldLeft(data_df){
  case(acc,colName) => acc.withColumn(colName+"2",hash(col(colName)))
}

edited Aug 2, 2019 at 8:53

answered Aug 2, 2019 at 8:40

a9207

3541 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to add new columns to a dataframe in a loop using scala on Azure Databricks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related