3

I'm trying to create a dataframe in Pandas that has two variables ("date" and "time_of_day" where "date" is 120 observations long with 30 days (each day has four observations: 1,1,1,1; 2,2,2,2; etc.) and then the second variable "time_of_day) repeats 30 times with values of 1,2,3,4.

The closest I found to this question was here: How to create a series of numbers using Pandas in Python, which got me the below code, but I'm receiving an error that it must be a 1-dimensional array.

df = pd.DataFrame({'date': np.tile([pd.Series(range(1,31))],4), 'time_of_day': pd.Series(np.tile([1, 2, 3, 4],30 ))})

So the final dataframe would look something like

date time_of_day
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4

Thanks much!

2 Answers 2

7

you need once np.repeat and once np.tile

df = pd.DataFrame({'date': np.repeat(range(1,31),4), 
                   'time_of_day': np.tile([1, 2, 3, 4],30)})
print(df.head(10))
   date  time_of_day
0     1            1
1     1            2
2     1            3
3     1            4
4     2            1
5     2            2
6     2            3
7     2            4
8     3            1
9     3            2

or you could use pd.MultiIndex.from_product, same result.

df = (
    pd.MultiIndex.from_product([range(1,31), range(1,5)], 
                               names=['date','time_of_day'])
      .to_frame(index=False)
)

or product from itertools

from itertools import product
df = pd.DataFrame(product(range(1,31), range(1,5)), columns=['date','time_of_day'])
Sign up to request clarification or add additional context in comments.

9 Comments

Just what I needed...thanks much! Thanks also for including alternative answers.
As a follow-up question (which maybe should just be a new question), let's say I had a vector of participant values, and I wanted the date and time_of_day columns to repeat (such that each participant received the 120 observations); how would I do that? Essentially, it would be repeating the data frame you created for each participant (either created as one data frame, or then concatenated into one dataframe).
@James the easiest is then to use the method ...from_product([list_of_participants, range(1,31), range(1,5)], names=['participant','day','time_of_day'])....
@James if you want toreuse the df created, then pd.concat([df.assign(participant=_id) for _id in list_of_participant], ignore_index=True) should work too (check for typo)
Thanks; that's exactly what I want, but then I'm having issues joining it with my original dataset to create a full data frame representing missing values. I thought I could just join on participant but I end up with far more values than I should have. Any idea how I would solve that? I just created the question here to make it easier: stackoverflow.com/questions/70189766/…
|
3

New feature in merge cross

out = pd.DataFrame(range(1,31)).merge(pd.DataFrame([1, 2, 3, 4]),how='cross')

1 Comment

Thanks for including another solution. Helpful in learning Python!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.