I have a situation as below. I have a master dataframe DF1. I am processing inside a for-loop to reflect the changes and my pseudo codes are as below.
for Year in [2019, 2020]:
query_west = query_{Year}
df_west = spark.sql(query_west)
df_final = DF1.join(df_west, on['ID'], how='left')
In this case df_final is getting joined with query and getting updated every iteration right? I want that changes to be reflected happening on my main dataframe DF1 every iteration inside the for loop.
Please let me know whether my logic is right. Thanks.
df1 = df_finalafter 3rd now, you will be creatingdf_finaleach iteration and you will only have latest result at the end of loopdf1 = df_finalas 4th line in my code inside the for loop?