3

I have a condition where I have to add 5 columns (to an existing DF) for 5 months of a year.

The existing DF is like:

EId EName Esal
1   abhi  1100
2   raj   300
3   nanu  400
4   ram   500

The Output should be as follows:

EId EName Esal Jan  Feb  March April May  
1   abhi  1100 1100 1100 1100  1100  1100 
2   raj   300  300  300  300   300   300  
3   nanu  400  400  400  400   400   400
4   ram   500  500  500  500   500   500

I can do this one by one with withColumn but that takes a lot of time.

Is there a way I can run some loop and keep on adding columns till my conditions are exhausted.

Many thanks in advance.

1
  • How is "I can do this one by one with withColumn but that takes a lot of time." different from "Is there a way I can run some loop and keep on adding columns till my conditions are exhausted."? I don't see any difference. Commented Jan 6, 2018 at 17:54

3 Answers 3

7

You can use foldLeft. You'll need to create a List of the columns that you want.

df.show
+---+----+----+
| id|name| sal|
+---+----+----+
|  1|   A|1100|
+---+----+----+

val list = List("Jan", "Feb" , "Mar", "Apr") // ... you get the idea

list.foldLeft(df)((df, month) => df.withColumn(month , $"sal" ) ).show
+---+----+----+----+----+----+----+
| id|name| sal| Jan| Feb| Mar| Apr|
+---+----+----+----+----+----+----+
|  1|   A|1100|1100|1100|1100|1100|
+---+----+----+----+----+----+----+

So, basically what happens is you fold the sequence you created while starting with the original dataframe and applying transformation as you keep on traversing through the list.

Sign up to request clarification or add additional context in comments.

3 Comments

How is that solution different from "I can do this one by one with withColumn but that takes a lot of time."?
It's not. It's just a little cleaner, I guess than back-to-back withColumns
@JacekLaskowski The difference is that I want to loop over and keep on adding columns....say for example I have to add 20 or 30 columns bases on a condition, then is it possible to do that in a loop or somehow that I dont have to manually add it everytime, it should be iterated?
4

Yes , You can do the same using foldLeft.FoldLeft traverse the elements in the collection from left to right with the desired value.

So you can store the desired columns in a List(). For Example:

val BazarDF = Seq(
        ("Veg", "tomato", 1.99),
        ("Veg", "potato", 0.45),
        ("Fruit", "apple", 0.99),
        ("Fruit", "pineapple", 2.59)
         ).toDF("Type", "Item", "Price")

Create a List with column name and values(as an example used null value)

var ColNameWithDatatype = List(("Jan", lit("null").as("StringType")),
      ("Feb", lit("null").as("StringType")
     ))
var BazarWithColumnDF1 = ColNameWithDatatype.foldLeft(BazarDF) 
  { (tempDF, colName) =>
                     tempDF.withColumn(colName._1, colName._2)
                }

You can see the example Here

Comments

2

Have in mind that withColumn method of DataFrame could have performance issues when called in loop:

this method introduces a projection internally. Therefore, calling it multiple times, for instance, via loops in order to add multiple columns can generate big plans which can cause performance issues and even StackOverflowException. To avoid this, use select with the multiple columns at once.

The safer way is to do it with select:

val monthsColumns = months.map { month:String =>
  col("sal").as(month)
}
val updatedDf = df.select(df.columns.map(col) ++ monthsColumns: _*)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.