Skip intermediate datetime values on X-axis in Pandas render

Question

I have a DataFrame, group and sum that by hour, which turns it into a Series. When I plot, that, the x-axis is completely garbled, unreadable.

Summarized in code:

bicycles = both_directions.query('type == "BICYCLE"')
display(bicycles.info())

timegroups = bicycles.groupby(pd.Grouper(key='date_time', axis=0, freq="1H", sort=True)).count()['date']
display(timegroups)
display(type(timegroups.index))
timegroups.plot(kind="bar", stacked=True)

Which outputs:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2025 entries, 0 to 3588
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   date_time    2025 non-null   datetime64[ns]
 1   speed        2025 non-null   int64         
 2   time         2025 non-null   object        
 3   date         2025 non-null   object        
 4   direction    2025 non-null   int64         
 5   length       2025 non-null   float64       
 6   length_norm  2025 non-null   int64         
 7   speed_norm   2025 non-null   int64         
 8   type         2025 non-null   string        
dtypes: datetime64[ns](1), float64(1), int64(4), object(2), string(1)
memory usage: 158.2+ KB

None

date_time
2022-06-01 14:00:00     1
2022-06-01 15:00:00    11
2022-06-01 16:00:00     3
2022-06-01 17:00:00     8
2022-06-01 18:00:00     2
                       ..
2022-06-13 09:00:00     0
2022-06-13 10:00:00     5
2022-06-13 11:00:00    13
2022-06-13 12:00:00    12
2022-06-13 13:00:00    13
Freq: H, Name: date, Length: 288, dtype: int64

pandas.core.indexes.datetimes.DatetimeIndex

<matplotlib.axes._subplots.AxesSubplot at 0x7fcd133c3a90>

What is the way to (smartly) skip values so that X-axis labels remain readable?

According to Panda's documentation it should already do this automatically, using default behaviour.

Pandas includes automatically tick resolution adjustment for regular frequency time-series data.

But it is clear, it doesn't in this case. What am I doing wrong? Is there a setting or conversion I'm missing? Is it a type issue (series vs dataframe?)

Laurent · Accepted Answer · 2022-08-19 09:56:07Z

1

Given the following toy dataframe mimicking yours (one value per hour during one year), but with duplicated values (each hour in date_time column is repeated twice):

import random

import pandas as pd

df = pd.DataFrame(
    {
        "date_time": pd.to_datetime(
            pd.date_range(start="1/1/2021", end="12/31/2021", freq="H"), unit="H"
        )
    }
)
df["count"] = [int(random.random() * 100) for _ in range(df.shape[0])]
df = pd.concat([df, df]).reset_index(drop=True)  # Add duplicates

df.info()
# Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17474 entries, 0 to 17473
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_time  17474 non-null  datetime64[ns]
 1   count      17474 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 273.2 KB

xlabels are unreadable when plotting the dataframe as is (note that it would also be the case without duplicated values in date_time column):

df.plot(x="date_time", kind="bar", stacked=True)

Output:

One way to fix that is to manually set xticks, for instance, to the index of each month end day using pandas asfreq, and then set xlabels accordingly by chain calling set_xticklabels with the corresponding datetime values; at each step, duplicated values are taken into account so that, even if all values are plotted in the end, ticks and labels remain unique:

# Index of end of months dates in df
df = df.sort_values(by="date_time").reset_index(drop=True)
end_of_months = (
    df[
        df["date_time"].isin(
            df.drop_duplicates(subset=["date_time"])
            .set_index(["date_time"])
            .asfreq("M")
            .reset_index()["date_time"]
        )
    ]
    .drop_duplicates(subset=["date_time"])
    .index.tolist()
)

df.plot(x="date_time", kind="bar", stacked=True, xticks=end_of_months).set_xticklabels(
    df.loc[end_of_months, "date_time"].dt.strftime("%Y-%m-%d").unique(),
    rotation=45,
    ha="right",
)

Output:

edited Aug 19, 2022 at 9:56

answered Jul 31, 2022 at 9:54

Laurent

13.7k7 gold badges30 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

berkes Over a year ago

Finally had some time to look at this. Your solution doesn't work for me. I did not clarify it, but the column "date_time" is not unique. That column is not the index either. This complicates the matter a lot, since we cannot leverage the "index" for lookups based on another interval, such as "end-of-month" (Or, in my case, two-hourly).

Laurent Over a year ago

Indeed, with duplicates in "date_time" column, it does not work. Do you care what ticks and labels are shown on the plot? Or, is it ok if the column is binned arbitrarily?

berkes Over a year ago

I would prefer to have every nth hour shown. Some consistency, on the x-axis.

Laurent Over a year ago

Hi, I've updated my answer to deal with an example where there are duplicated values in "date_time" column. Cheers.

Collectives™ on Stack Overflow

Skip intermediate datetime values on X-axis in Pandas render

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related