Creating Data Frame with repeating values that repeat

Question

I'm trying to create a dataframe in Pandas that has two variables ("date" and "time_of_day" where "date" is 120 observations long with 30 days (each day has four observations: 1,1,1,1; 2,2,2,2; etc.) and then the second variable "time_of_day) repeats 30 times with values of 1,2,3,4.

The closest I found to this question was here: How to create a series of numbers using Pandas in Python, which got me the below code, but I'm receiving an error that it must be a 1-dimensional array.

df = pd.DataFrame({'date': np.tile([pd.Series(range(1,31))],4), 'time_of_day': pd.Series(np.tile([1, 2, 3, 4],30 ))})

So the final dataframe would look something like

date	time_of_day
1	1
1	2
1	3
1	4
2	1
2	2
2	3
2	4

Thanks much!

Ben.T · Accepted Answer · 2021-12-01 01:41:24Z

7

you need once np.repeat and once np.tile

df = pd.DataFrame({'date': np.repeat(range(1,31),4), 
                   'time_of_day': np.tile([1, 2, 3, 4],30)})
print(df.head(10))
   date  time_of_day
0     1            1
1     1            2
2     1            3
3     1            4
4     2            1
5     2            2
6     2            3
7     2            4
8     3            1
9     3            2

or you could use pd.MultiIndex.from_product, same result.

df = (
    pd.MultiIndex.from_product([range(1,31), range(1,5)], 
                               names=['date','time_of_day'])
      .to_frame(index=False)
)

or product from itertools

from itertools import product
df = pd.DataFrame(product(range(1,31), range(1,5)), columns=['date','time_of_day'])

answered Dec 1, 2021 at 1:41

Ben.T

29.7k6 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

James Over a year ago

Just what I needed...thanks much! Thanks also for including alternative answers.

James Over a year ago

As a follow-up question (which maybe should just be a new question), let's say I had a vector of participant values, and I wanted the date and time_of_day columns to repeat (such that each participant received the 120 observations); how would I do that? Essentially, it would be repeating the data frame you created for each participant (either created as one data frame, or then concatenated into one dataframe).

Ben.T Over a year ago

@James the easiest is then to use the method ...from_product([list_of_participants, range(1,31), range(1,5)], names=['participant','day','time_of_day'])....

Ben.T Over a year ago

@James if you want toreuse the df created, then pd.concat([df.assign(participant=_id) for _id in list_of_participant], ignore_index=True) should work too (check for typo)

James Over a year ago

Thanks; that's exactly what I want, but then I'm having issues joining it with my original dataset to create a full data frame representing missing values. I thought I could just join on participant but I end up with far more values than I should have. Any idea how I would solve that? I just created the question here to make it easier: stackoverflow.com/questions/70189766/…

|

BENY · Accepted Answer · 2021-12-01 01:56:59Z

3

New feature in merge cross

out = pd.DataFrame(range(1,31)).merge(pd.DataFrame([1, 2, 3, 4]),how='cross')

answered Dec 1, 2021 at 1:56

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

James Over a year ago

Thanks for including another solution. Helpful in learning Python!

Collectives™ on Stack Overflow

Creating Data Frame with repeating values that repeat

2 Answers 2

9 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related