0

I am trying concat multiple columns in a data frame . My column list are present in a variable. I am trying to pass that variable into concat function but not able to do that.

Ex: base_tbl_columns contain list of columns and I am using below code to select all the columns mentioned in the varibale .

    scala> val base_tbl_columns  = scd_table_keys_df.first().getString(5).split(",")
    base_tbl_columns: Array[String] = Array(acct_nbr, account_sk_id, zip_code, primary_state, eff_start_date, eff_end_date, load_tm, hash_key, eff_flag)

val hist_sk_df_ld = hist_sk_df.select(base_tbl_columns.head,base_tbl_columns.tail: _*)

Similarly, I have one more list whcih I want to use for concatenation. But there the concat function is not taking the .head and .tail argument.

scala> val hash_key_cols = scd_table_keys_df.first().getString(4)
    hash_key_cols: String = primary_state,zip_code

 Here I am hard coding the value primary_state and zip_code.
    .withColumn("hash_key_col",concat($"primary_state",$"zip_code"))

 Here I am passing the variable hash_key_cols .
   .withColumn("hash_key_col",concat(hash_key_cols ))

I was able t do this in python by using the code below.

hist_sk_df = hist_tbl_df.join(broadcast(hist_tbl_lkp_df) ,primary_key_col,'inner' ).withColumn("eff_start_date",lit(load_dt))**.withColumn('hash_key_col',F.concat(*hash_key_cols))**.withColumn("hash_key",hash_udf('hash_key_col')).withColumn("eff_end_date",lit(eff_close_dt)).withColumn("load_tm",lit(load_tm)).withColumn("eff_flag",lit(eff_flag_curr))

1 Answer 1

1

Either:

val base_tbl_columns: Array[String] = ???

df.select(concat(base_tbl_columns.map(c => col(c)): _*))

or:

df.select(expr(s"""concat(${base_tbl_columns.mkstring(",")})"""))
Sign up to request clarification or add additional context in comments.

3 Comments

It's worked for me. Iused the first option. scala> val hash_key_cols = scd_table_keys_df.first().getString(4).split(",") hash_key_cols: Array[String] = Array(primary_state, zip_code) val hist_sk_df = hist_tbl_df.join(broadcast(hist_tbl_lkp_df) ,Seq(primary_key_col),"inner" ).withColumn("eff_start_date",lit(load_dt)).withColumn("hash_key_col",concat(hash_key_cols.map(c => col(c)): _*))
Want to use similar varibale in join also scala> primary_key_col res59: String = acct_nbr scala> delta_primary_key_col res60: String = delta_acct_n . I want to use this two variable in my join key... How can it be done. val cdc_new_acct_df = delta_src_rename_df.join(hist_tgt_tbl_Y_df ,(delta_src_rename_df($delta_primary_key_col) == hist_tgt_tbl_Y_df($primary_key_col),"left_outer" ))
anyone have this in python?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.