1

I have a DataFrame, group and sum that by hour, which turns it into a Series. When I plot, that, the x-axis is completely garbled, unreadable.

Summarized in code:

bicycles = both_directions.query('type == "BICYCLE"')
display(bicycles.info())

timegroups = bicycles.groupby(pd.Grouper(key='date_time', axis=0, freq="1H", sort=True)).count()['date']
display(timegroups)
display(type(timegroups.index))
timegroups.plot(kind="bar", stacked=True)

Which outputs:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2025 entries, 0 to 3588
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   date_time    2025 non-null   datetime64[ns]
 1   speed        2025 non-null   int64         
 2   time         2025 non-null   object        
 3   date         2025 non-null   object        
 4   direction    2025 non-null   int64         
 5   length       2025 non-null   float64       
 6   length_norm  2025 non-null   int64         
 7   speed_norm   2025 non-null   int64         
 8   type         2025 non-null   string        
dtypes: datetime64[ns](1), float64(1), int64(4), object(2), string(1)
memory usage: 158.2+ KB

None

date_time
2022-06-01 14:00:00     1
2022-06-01 15:00:00    11
2022-06-01 16:00:00     3
2022-06-01 17:00:00     8
2022-06-01 18:00:00     2
                       ..
2022-06-13 09:00:00     0
2022-06-13 10:00:00     5
2022-06-13 11:00:00    13
2022-06-13 12:00:00    12
2022-06-13 13:00:00    13
Freq: H, Name: date, Length: 288, dtype: int64

pandas.core.indexes.datetimes.DatetimeIndex

<matplotlib.axes._subplots.AxesSubplot at 0x7fcd133c3a90>

Garbled X-Axis output from matplotlib

What is the way to (smartly) skip values so that X-axis labels remain readable?

According to Panda's documentation it should already do this automatically, using default behaviour.

Pandas includes automatically tick resolution adjustment for regular frequency time-series data.

But it is clear, it doesn't in this case. What am I doing wrong? Is there a setting or conversion I'm missing? Is it a type issue (series vs dataframe?)

1 Answer 1

1

Given the following toy dataframe mimicking yours (one value per hour during one year), but with duplicated values (each hour in date_time column is repeated twice):

import random

import pandas as pd

df = pd.DataFrame(
    {
        "date_time": pd.to_datetime(
            pd.date_range(start="1/1/2021", end="12/31/2021", freq="H"), unit="H"
        )
    }
)
df["count"] = [int(random.random() * 100) for _ in range(df.shape[0])]
df = pd.concat([df, df]).reset_index(drop=True)  # Add duplicates
df.info()
# Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17474 entries, 0 to 17473
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_time  17474 non-null  datetime64[ns]
 1   count      17474 non-null  int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 273.2 KB

xlabels are unreadable when plotting the dataframe as is (note that it would also be the case without duplicated values in date_time column):

df.plot(x="date_time", kind="bar", stacked=True)

Output:

enter image description here

One way to fix that is to manually set xticks, for instance, to the index of each month end day using pandas asfreq, and then set xlabels accordingly by chain calling set_xticklabels with the corresponding datetime values; at each step, duplicated values are taken into account so that, even if all values are plotted in the end, ticks and labels remain unique:

# Index of end of months dates in df
df = df.sort_values(by="date_time").reset_index(drop=True)
end_of_months = (
    df[
        df["date_time"].isin(
            df.drop_duplicates(subset=["date_time"])
            .set_index(["date_time"])
            .asfreq("M")
            .reset_index()["date_time"]
        )
    ]
    .drop_duplicates(subset=["date_time"])
    .index.tolist()
)

df.plot(x="date_time", kind="bar", stacked=True, xticks=end_of_months).set_xticklabels(
    df.loc[end_of_months, "date_time"].dt.strftime("%Y-%m-%d").unique(),
    rotation=45,
    ha="right",
)

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Finally had some time to look at this. Your solution doesn't work for me. I did not clarify it, but the column "date_time" is not unique. That column is not the index either. This complicates the matter a lot, since we cannot leverage the "index" for lookups based on another interval, such as "end-of-month" (Or, in my case, two-hourly).
Indeed, with duplicates in "date_time" column, it does not work. Do you care what ticks and labels are shown on the plot? Or, is it ok if the column is binned arbitrarily?
I would prefer to have every nth hour shown. Some consistency, on the x-axis.
Hi, I've updated my answer to deal with an example where there are duplicated values in "date_time" column. Cheers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.