-5

I am trying to plot some data using matplotlib and pandas. However when using the DateFormatter, dates are being rendered incorrectly depending on what I filter out of the DataFrame:

The dates in the two examples below render with matplotlib as 'August 20 00 2013', as expected:

df['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

df[df['metric1']>1000]['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()

However using the code below, the dates are being rendered as 'February 01 00 1048':

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = gca()
ax.xaxis.set_major_formatter(DateFormatter('%B %d %H %Y'))
draw()
5
  • 7
    Without seeing some of these data it's going to be hard to diagnose the problem. Commented Sep 17, 2013 at 23:22
  • maybe related stackoverflow.com/questions/13988111/… on the chance that pandas is still fouling up the date handling code. Commented Sep 18, 2013 at 1:04
  • The dates look like this '2013-08-18 00' in the original file, followed by a browser(in the format above) and 3 metrics. Here is how I am pulling the data from the file into pandas:def dateParserHour(time_string): return datetime.datetime.strptime(time_string, '%Y-%m-%d %H') and pd.read_table('file.txt', index_col=0, parse_dates=True, date_parser=dateParserHour) Commented Sep 18, 2013 at 2:25
  • Can you just show df.head() or some other subset of your data instead of trying to describe it? Thanks. Commented Sep 18, 2013 at 3:07
  • I have found a work around. For some reason, when I am plotting the third example above, matplotlib won't play nice with with my TimeSeries. If I rebuild the index with the code below and then plot (with the same DateFormatter() function, it works fine. df2 = df[df['browser']=='Chrome/29']['metric2']; df2.index = df2.index.astype(datetime.datetime); Commented Sep 18, 2013 at 22:24

1 Answer 1

4

We need to have a concrete set of data and a program to refer to. No problems here:

data.txt:

2013-08-18 00   IE  1000    500 3000
2013-08-19 00   FF  2000    250 6000
2013-08-20 00   Opera   3000    450 9000
2001-03-21 00   Chrome/29   3000    450 9000
2013-08-21 00   Chrome/29   3000    450 9000
2014-01-22 00   Chrome/29   3000    750 9000

.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as md
import datetime as dt


df = pd.read_table(
    'data.txt', 
    index_col=0, 
    parse_dates=True,
    date_parser=lambda s: dt.datetime.strptime(s, '%Y-%m-%d %H'),
    header=None,
    names=['browser', 'metric1', 'metric2', 'metric3']
)

print df

df[df['browser']=='Chrome/29']['metric2'].plot()
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))
plt.draw()
plt.show()


--output:--
              browser  metric1  metric2  metric3
2013-08-18         IE     1000      500     3000
2013-08-19         FF     2000      250     6000
2013-08-20      Opera     3000      450     9000
2001-03-21  Chrome/29     3000      450     9000
2013-08-21  Chrome/29     3000      450     9000
2014-01-22  Chrome/29     3000      750     9000

enter image description here

And with the axes adjusted so you can see the points better(setting date range of x axis, setting range of y axis):

...
df[df['browser']=='Chrome/29']['metric2'].plot(style='r--')
ax = plt.gca()
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d %H %Y'))

ax.set_xlim(dt.datetime(2000, 1, 1,), dt.datetime(2017, 1, 1))
ax.set_ylim(400, 1000)
...
...

enter image description here

As long as you refuse to post a minimal example along with the data that produces the output you don't want...

Sign up to request clarification or add additional context in comments.

4 Comments

I fail to see why this answer got a down vote
I originally downvoted because all this answer does is show the expected behavior (not really that helpful since the OP wasn't seeing this behavior). However, a downvote was probably a bit overkill. My apologies.
Sorry about the delayed response. The sample I was preparing is just like the one above. The only difference is that my index has the name 'hour'(which is the label of the column in the original file). I created a new file just containing the first 5 rows of the original to rerun the analysis. When I do this my dates in matplotlib appear as expected. Could it be possible that a certain value in the TimeSeries from the original file is causing the issues? Just by scanning the unique values in the original file I don't see any issues.
The only difference is that my index has the name 'hour' Can you explain what that means?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.