I am facing a code challenge in spark within Azure databricks. I have a dataset as
+----+-------------------+----------+--------+-----+---------------+
|OPID| Date|BaseAmount|Interval|Cycle|RateIncrease(%)|
+----+-------------------+----------+--------+-----+---------------+
| O1|2014-07-27 00:00:00| 4375| 12| 2| 2%|
| O2|2020-12-23 00:00:00| 4975| 7| 3| 5%|
+----+-------------------+----------+--------+-----+---------------+
I need to use a loop function which replicates the rows based on Interval, Cycle and RateIncrease(%). Interval and Cycle fields give me the number of rows to be replicated.
Number of rows = Interval * Cycle.
For OPID O1, the number of rows must be 24 (12 * 2).
After 12 months (12 rows) or after 1 cycle completion, the values of BaseAmount are going to increase by 2% for OPID O1 and this happens after every cycle. which should result in a table as below:
+----+-------------------+----------+--------+-----+---------------+
|OPID| Date|BaseAmount|Interval|Cycle|RateIncrease(%)|
+----+-------------------+----------+--------+-----+---------------+
| O1|2014-07-27 00:00:00| 4375| 12| 2| 2%|
| O1|2014-08-27 00:00:00| 4375| 12| 2| 2%|
| O1|2014-09-27 00:00:00| 4375| 12| 2| 2%|
| O1|2014-10-27 00:00:00| 4375| 12| 2| 2%|
| O1|2014-11-27 00:00:00| 4375| 12| 2| 2%|
| O1|2014-12-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-01-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-02-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-03-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-04-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-05-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-06-27 00:00:00| 4375| 12| 2| 2%|
| O1|2015-07-27 00:00:00| 4463| 12| 2| 2%|
| O1|2015-08-27 00:00:00| 4463| 12| 2| 2%|
| O1|2015-09-27 00:00:00| 4463| 12| 2| 2%|
| O1|2015-10-27 00:00:00| 4463| 12| 2| 2%|
| O1|2015-11-27 00:00:00| 4463| 12| 2| 2%|
| O1|2015-12-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-01-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-02-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-03-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-04-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-05-27 00:00:00| 4463| 12| 2| 2%|
| O1|2016-06-27 00:00:00| 4463| 12| 2| 2%|
| O2|2020-12-23 00:00:00| 4975| 7| 3| 5%|
.
.
.
+----+-------------------+----------+--------+-----+---------------+
I got the initial bits solved thanks to user @mck. How to insert a custom function within For loop in pyspark?
Thank you.