0

I have an empty dataframe with schema already created. I'm trying to add the columns to this dataframe from a new dataframe to the existing columns in a for loop.

k schema - |ID|DATE|REPORTID|SUBMITTEDDATE|

for(data <- 0 to range-1){
  val c = df2.select(substring(col("value"), str(data)._2, str(data)._3).alias(str(data)._1)).toDF()
  //c.show()
  k = c.withColumn(str(data)._1, c(str(data)._1))
}
k.show()

But the k dataframe has just one column, but it should have all the 4 columns populated with values. I think the last line in for loop is replacing exisitng columns in the dataframe.

Can somebody help me with this?

Thanks!!

7
  • 1
    Why are you adding columns to empty dataframe? Why can't you replace the entire dataframe with the current one? Commented Oct 3, 2017 at 15:11
  • the actual df2 dataframe has a single column. I have to select some substrings from the df2 dataframe and then add them to the k dataframe based on he schema. So I created a val and then added the column to it and then replacing the exisiting column in k Commented Oct 3, 2017 at 15:25
  • Can you add example for it? We could not understand what you are trying to do. Commented Oct 3, 2017 at 17:25
  • @VarunChelakara As per the example the df k will have only 1 column because you are selecting only one column into df c [i.e., the df2.select() clause has only one substring column selected] which then you are assigning to k. Also the line "c.withColumn(str(data)._1, c(str(data)._1))" is confusing/redundant. Can you elaborate with an example? Commented Oct 3, 2017 at 17:32
  • @Bhuvan the df2 is a dataframe that has the data in a single column. I'm trying to divide that into multiple columns and store that in a dataframe. So, the schema for the new columns in provided in tuples. So I read the tuples and created an empty dataframe k. Now I'm iterating over the df2 to read the columns based on the substring positions and store them in k since it already has the schema. But the .withcolumn is supposed to add new columns. So is there any other way of reading the columns and adding them to k other than using join? Commented Oct 3, 2017 at 20:02

2 Answers 2

3

Add your logic and conditions and create new dataframe

val dataframe2 =  dataframe1.select("A","B",C)
Sign up to request clarification or add additional context in comments.

Comments

0

Copying few columns of a dataframe to another one is not possible in spark. Although there are few alternatives to attain the same

1. You need to join both the dataframe based on some join condition. 
2. Convert bot the data frame to json and do RDD Union

  val rdd = df1.toJSON.union(df2.toJSON) 
  val dfFinal = spark.read.json(rdd)

2 Comments

I have a dataframe k with 2 columns and when I run the below code, it is replacing all the existing columns in the dataframe var c = df2.select(substring(col("value"), str(data)._2, str(data)._3).alias(str(data)._1)).toDF() k = c.withColumn(str(data)._1, c(str(data)._1))
You can't iterate over data and add data to a dataframe. It not possible in spark.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.