I have a basic 'for' loop that shows the number of active customers each year. I can print the output, but I want the output to be a single table/dataframe (with 2 columns: year and # customers, each iteration of the loop creates 1 row in the table)
for yr in range(2018, 2023):
print (yr, df.filter(year(col('first_sale')) <= yr).count())
<= yr, each proceeding year will contain the count of the previous year, right? E.g. the count of 2019 will contain 2018's count. So why not group by the year, count, get a smaller pandas dataframe, and then cumsum on the count?