1

I have a Pandas dataframe with datetime column named time. I'd like to count the number of rows per hour. The problem is that I'd like the resulting table handle hours for which no rows exist. Example:

    time    id  lat lon type
0   2017-06-09 19:34:59.945128-07:00    75  36.999866   -122.058180 UPPER CAMPUS
1   2017-06-09 19:53:56.387058-07:00    75  36.979664   -122.058900 OUT OF SERVICE/SORRY
2   2017-06-09 19:28:53.525189-07:00    75  36.988640   -122.066820 UPPER CAMPUS
3   2017-06-09 19:30:31.633478-07:00    75  36.991657   -122.066605 UPPER CAMPUS

I can get these values using df.groupby(df.time.dt.hour).count() which produces:

    time    id  lat lon type
time                    
0   2121    2121    2121    2121    2121
1   2334    2334    2334    2334    2334
2   1523    1523    1523    1523    1523
6   8148    8148    8148    8148    8148

Which is correct: 0, 1, 2 are the hours of the day. However, I'd like to represent that there are no rows for hours 3, 4, 5. Having each of these column names is unnecessary, since the value is the same for each.

1 Answer 1

1

You can use reindex:

#if want all hours
df1 = df.groupby(df.time.dt.hour)[''].count().reindex(range(23), fill_value=0)

#if want 0 to max hour
df1 = df.groupby(df.time.dt.hour).count()
        .reindex(range(df.time.dt.hour.max() + 1), fill_value=0)
Sign up to request clarification or add additional context in comments.

1 Comment

thanks. i didn't mention it in my question, but I was actually wanting to use fill_value=0 parameter to reindex rather than get NaN. but this answer is correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.