0

I am facing a code challenge in spark within Azure databricks. I have a dataset as

+----+-------------------+----------+--------+-----+---------------+
|OPID|               Date|BaseAmount|Interval|Cycle|RateIncrease(%)|
+----+-------------------+----------+--------+-----+---------------+
|  O1|2014-07-27 00:00:00|      4375|      12|    2|             2%|
|  O2|2020-12-23 00:00:00|      4975|       7|    3|             5%|
+----+-------------------+----------+--------+-----+---------------+

I need to use a loop function which replicates the rows based on Interval, Cycle and RateIncrease(%). Interval and Cycle fields give me the number of rows to be replicated.

Number of rows = Interval * Cycle.

For OPID O1, the number of rows must be 24 (12 * 2).

After 12 months (12 rows) or after 1 cycle completion, the values of BaseAmount are going to increase by 2% for OPID O1 and this happens after every cycle. which should result in a table as below:

+----+-------------------+----------+--------+-----+---------------+
|OPID|               Date|BaseAmount|Interval|Cycle|RateIncrease(%)|
+----+-------------------+----------+--------+-----+---------------+
|  O1|2014-07-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-08-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-09-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-10-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-11-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-12-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-01-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-02-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-03-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-04-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-05-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-06-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-07-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-08-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-09-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-10-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-11-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-12-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-01-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-02-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-03-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-04-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-05-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-06-27 00:00:00|      4463|      12|    2|             2%|
|  O2|2020-12-23 00:00:00|      4975|       7|    3|             5%|
                                .
                                .
                                .
+----+-------------------+----------+--------+-----+---------------+

I got the initial bits solved thanks to user @mck. How to insert a custom function within For loop in pyspark?

Thank you.

1 Answer 1

1

You can explode a sequence of cycles first, in order to get the changed BaseAmount, and then explode a sequence of timestamps as in my previous answer:

import pyspark.sql.functions as F

result = df.withColumn(
    'cycle2',
    F.expr("explode(sequence(0, int(Cycle)-1))")
).withColumn(
    'BaseAmount',
    F.expr("int(round(BaseAmount * (1 + 0.01*cycle2*float(trim('%' from `RateIncrease(%)`)))))")
).withColumn(
    'Date',
    F.expr("""
        explode(
            sequence(
                timestamp(add_months(timestamp(Date), cycle2*(`Interval`))), 
                timestamp(add_months(timestamp(Date), (cycle2+1)*(`Interval`) - 1)),
                interval 1 month
            )
        )
    """)
).drop('cycle2')

result.show(999)
+----+-------------------+----------+--------+-----+---------------+
|OPID|               Date|BaseAmount|Interval|Cycle|RateIncrease(%)|
+----+-------------------+----------+--------+-----+---------------+
|  O1|2014-07-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-08-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-09-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-10-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-11-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2014-12-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-01-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-02-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-03-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-04-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-05-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-06-27 00:00:00|      4375|      12|    2|             2%|
|  O1|2015-07-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-08-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-09-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-10-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-11-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2015-12-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-01-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-02-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-03-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-04-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-05-27 00:00:00|      4463|      12|    2|             2%|
|  O1|2016-06-27 00:00:00|      4463|      12|    2|             2%|
|  O2|2020-12-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-01-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-02-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-03-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-04-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-05-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-06-23 00:00:00|      4975|       7|    3|             5%|
|  O2|2021-07-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2021-08-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2021-09-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2021-10-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2021-11-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2021-12-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2022-01-23 00:00:00|      5224|       7|    3|             5%|
|  O2|2022-02-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-03-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-04-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-05-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-06-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-07-23 00:00:00|      5473|       7|    3|             5%|
|  O2|2022-08-23 00:00:00|      5473|       7|    3|             5%|
+----+-------------------+----------+--------+-----+---------------+
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much for providing a clear solution. I would only change one bit in the query to suit my needs. F.expr("int(round(BaseAmount * power(1 + float(RateIncrease),Cycle2)))") as I will be doing a compounding growth

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.