Spark scala copying dataframe column to new dataframe

Question

I have an empty dataframe with schema already created. I'm trying to add the columns to this dataframe from a new dataframe to the existing columns in a for loop.

for(data <- 0 to range-1){
  val c = df2.select(substring(col("value"), str(data)._2, str(data)._3).alias(str(data)._1)).toDF()
  //c.show()
  k = c.withColumn(str(data)._1, c(str(data)._1))
}
k.show()

But the k dataframe has just one column, but it should have all the 4 columns populated with values. I think the last line in for loop is replacing exisitng columns in the dataframe.

Can somebody help me with this?

Thanks!!

Why are you adding columns to empty dataframe? Why can't you replace the entire dataframe with the current one? — Avishek Bhattacharya
– Avishek Bhattacharya, Commented Oct 3, 2017 at 15:11
the actual df2 dataframe has a single column. I have to select some substrings from the df2 dataframe and then add them to the k dataframe based on he schema. So I created a val and then added the column to it and then replacing the exisiting column in k — Varun Chelakara
– Varun Chelakara, Commented Oct 3, 2017 at 15:25
Can you add example for it? We could not understand what you are trying to do. — mrsrinivas
– mrsrinivas, Commented Oct 3, 2017 at 17:25
@VarunChelakara As per the example the df k will have only 1 column because you are selecting only one column into df c [i.e., the df2.select() clause has only one substring column selected] which then you are assigning to k. Also the line "c.withColumn(str(data)._1, c(str(data)._1))" is confusing/redundant. Can you elaborate with an example? — Bhuvan
– Bhuvan, Commented Oct 3, 2017 at 17:32
@Bhuvan the df2 is a dataframe that has the data in a single column. I'm trying to divide that into multiple columns and store that in a dataframe. So, the schema for the new columns in provided in tuples. So I read the tuples and created an empty dataframe k. Now I'm iterating over the df2 to read the columns based on the substring positions and store them in k since it already has the schema. But the .withcolumn is supposed to add new columns. So is there any other way of reading the columns and adding them to k other than using join? — Varun Chelakara
– Varun Chelakara, Commented Oct 3, 2017 at 20:02

vaquar khan · Accepted Answer · 2017-10-04 20:29:21Z

3

Add your logic and conditions and create new dataframe

val dataframe2 =  dataframe1.select("A","B",C)

answered Oct 4, 2017 at 20:29

vaquar khan

11.5k7 gold badges81 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Avishek Bhattacharya · Accepted Answer · 2017-10-03 15:28:36Z

0

Copying few columns of a dataframe to another one is not possible in spark. Although there are few alternatives to attain the same

1. You need to join both the dataframe based on some join condition. 
2. Convert bot the data frame to json and do RDD Union

  val rdd = df1.toJSON.union(df2.toJSON) 
  val dfFinal = spark.read.json(rdd)

answered Oct 3, 2017 at 15:28

Avishek Bhattacharya

7,0243 gold badges38 silver badges58 bronze badges

2 Comments

Varun Chelakara Over a year ago

I have a dataframe k with 2 columns and when I run the below code, it is replacing all the existing columns in the dataframe var c = df2.select(substring(col("value"), str(data)._2, str(data)._3).alias(str(data)._1)).toDF() k = c.withColumn(str(data)._1, c(str(data)._1))

Avishek Bhattacharya Over a year ago

You can't iterate over data and add data to a dataframe. It not possible in spark.

Collectives™ on Stack Overflow

Spark scala copying dataframe column to new dataframe

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related