Is it possible to create DataFrame dynamically.
Example want to create list of Dates and its Day in two columns for given date range?
Input:- 03-01-2018 - 03-31-2018
03-01-2018 THU
03-02-2018 FRI
.......
03-31-2018 SAT
You can use python for that and then export to spark :
import datetime
start = datetime.date(2018,3,1)
end = datetime.date(2018,3,31)
date_list = []
for i in range((end - start).days+1):
date_list.append(start+datetime.timedelta(days=i))
sc.parallelize(date_list).take(2)
[datetime.date(2018, 3, 1), datetime.date(2018, 3, 2)]
sc.parallelize(date_list).count()
31
In the case your date range is within a dataframe, you have to create an UDF which takes as args the 2 dates and return an array of dates, then you explode it.