1

I have two dataframes, df1 and df2 as shown below:

df1.show()
+---+--------+-----+----+--------+
|c1 |   c2   |  c3 | c4 |   c5   |
+---+--------+-----+----+--------+
|  A|   abc  | 0.1 | 0.0|   0    |
|  B|   def  | 0.15| 0.5|   0    |
|  C|   ghi  | 0.2 | 0.2|   1    |
|  D|   jkl  | 1.1 | 0.1|   0    |
|  E|   mno  | 0.1 | 0.1|   0    |
+---+--------+-----+----+--------+


df2.show()
+---+--------+-----+----+--------+
|c1 |   c2   |  c3 | c4 |   c5   |
+---+--------+-----+----+--------+
|  A|   abc  | a   | b  |   ?    |
|  C|   ghi  | a   | c  |   ?    |
+---+--------+-----+----+--------+

I would like to update c5 column in df1 and set it to 1, if the row is referenced in df2. Each record is identified by c1 and c2 columns.

Below is the desired output; Note that the c5 value of the first record was updated to 1:

+---+--------+-----+----+--------+
|c1 |   c2   |  c3 | c4 |   c5   |
+---+--------+-----+----+--------+
|  A|   abc  | 0.1 | 0.0|   1    |
|  B|   def  | 0.15| 0.5|   0    |
|  C|   ghi  | 0.2 | 0.2|   1    |
|  D|   jkl  | 1.1 | 0.1|   0    |
|  E|   mno  | 0.1 | 0.1|   0    |
+---+--------+-----+----+--------+

1 Answer 1

2

Left join df2 on to df1 and use a case when .. expression for c5.

from pyspark.sql.functions import when,*
joined_dfs = df1.join(df2,(df1.c1 == df2.c1) & (df1.c2 == df2.c2),'left').select('df1.*')
joined_dfs.select(joined_dfs.c1,joined_dfs.c2,joined_dfs.c3,joined_dfs.c4) \
          .withColumn('c5',when((joined_dfs.c1.isNotNull()) & (joined_dfs.c2.isNotNull()),1).otherwise(0)) \ 
          .show()
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your reply. I get an error: AnalysisException: "cannot resolve 'df1.*' given input columns '';"
I don't quite understand the second part of your answer. And it doesn't seem to work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.