1

I have the following spark data frame.

Date_1         Value     Date_2 
20-10-2021       1        Date 
20-10-2021       2        Date 
21-10-2021       3        Date 
23-10-2021       4        Date 

I would like to fill Date_2 values by adding Date_1 + (Value-1).

The output that I would like to see is the following.

Date_1         Value     Date_2 
20-10-2021       1        20-10-2021
20-10-2021       2        21-10-2021 
21-10-2021       3        23-10-2021
23-10-2021       4        26-10-2021 

I have tried this using pyspark.

import pyspark.sql.functions as F

df = df.withColumn('Date_2', F.date_add(df['Date_1'], (df['Value'] -1)).show()

But I am getting TypeError: Column is not iterable.

Can anyone help with this?

2 Answers 2

1

You would need to parse SQL function DATE_ADD like this:

(
    df
    .withColumn("Value", F.col("Value").cast("int"))
    .withColumn("Date_2", 
                F.expr('DATE_ADD(Date_1, Value - 1)')
               )
)

DATE_ADD(Date_1, Value - 1) will add to each row in Date_1 column value from column Value -1 (counting in days).

Additionally (if you don't have it done yet) Value columns should be INT. If you would have there for example DOUBLE type, AnalysisException occur.

Sign up to request clarification or add additional context in comments.

4 Comments

This is perfect! Thanks.
when I take this solution to real one I am getting a strange error. Says AttributeError: '_io.TextIOWrapper' object has no attribute 'col' do you have any idea about this col?
Do you use F.col("Value") where F is from pyspark.sql import functions as F? I see similar error only if I replaced "F.col()" into "df.col()": AttributeError: 'DataFrame' object has no attribute 'col'
Don't you accidentally use f.col() instead F.col() where f in your case is something like this f = open(file_with_data)?
1

The signature of the function date_add is (col, int). Therefore, you cannot use directly df['Value'].

try this :

df = df.withColumn('Date_2', F.expr("date_add(Date_1, Value -1)")).show()

1 Comment

Can you make it a bit clear where the quotation which starts on date_add ends? AND Can it iterate on Value-1 as well?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.