1

Is it possible to create DataFrame dynamically.

Example want to create list of Dates and its Day in two columns for given date range?

Input:- 03-01-2018 - 03-31-2018

03-01-2018 THU
03-02-2018 FRI

.......

03-31-2018 SAT

1 Answer 1

1

You can use python for that and then export to spark :

import datetime
start = datetime.date(2018,3,1)
end = datetime.date(2018,3,31)

date_list = []
for i in range((end - start).days+1):
    date_list.append(start+datetime.timedelta(days=i))

sc.parallelize(date_list).take(2)
[datetime.date(2018, 3, 1), datetime.date(2018, 3, 2)]
sc.parallelize(date_list).count()
31

In the case your date range is within a dataframe, you have to create an UDF which takes as args the 2 dates and return an array of dates, then you explode it.

Sign up to request clarification or add additional context in comments.

5 Comments

Yes I prefer explode option., But possible to apply explode on empty data frame? I have to define DataFrame and apply explode?
@syv your dataframe is not empty. Your starting point is the dataframe with the start/end columns
Can I take start/end as parameters when i submit the script into Spark I'll supply them? Anyways like that.
@syv if they are parameters, you use the method i wrote.The explode works only with data from dataframe.
OK Thank you !!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.