2

I have a query that I run that outputs a list of data consisting of a date string and a count:

date_cnts = [(u'2014-06-27', 1),
 (u'2014-06-29', 3),
 (u'2014-06-30', 1),
 (u'2014-07-01', 1),
 (u'2014-07-02', 1),
 (u'2014-07-09', 1),
 (u'2014-07-10', 3),
 (u'2014-07-11', 1),
 (u'2014-07-12', 2),
 (u'2014-07-14', 1),
 (u'2014-07-15', 2),
 (u'2014-07-17', 3),
 (u'2014-07-18', 1),
 (u'2014-07-20', 1),
 (u'2014-07-21', 1),
 (u'2014-07-23', 2),
 (u'2014-07-26', 2),
 (u'2014-07-27', 2),
 (u'2014-07-28', 7),
 (u'2014-07-29', 3),
 (u'2014-07-31', 2),
 (u'2014-08-01', 1),
 (u'2014-08-05', 4),
 (u'2014-08-07', 2),
 (u'2014-08-08', 1),
 (u'2014-08-13', 1),
 (u'2014-08-14', 3),
 (u'2014-08-15', 1),
 (u'2014-08-16', 6),
 (u'2014-08-17', 1),
 (u'2014-08-18', 1),
 (u'2014-08-20', 1),
 (u'2014-08-24', 1),
 (u'2014-08-25', 3),
 (u'2014-08-29', 1),
 (u'2014-08-30', 1),
 (u'2014-09-03', 3),
 (u'2014-09-13', 1),
 (u'2014-09-14', 1),
 (u'2014-09-24', 3),
 (u'2014-10-20', 1),
 (u'2014-10-24', 1),
 (u'2014-11-05', 3),
 (u'2014-11-09', 1),
 (u'2014-11-12', 1),
 (u'2014-11-13', 1),
 (u'2014-11-14', 1),
 (u'2014-11-18', 1),
 (u'2014-11-19', 4),
 (u'2014-11-22', 1),
 (u'2014-11-26', 3),
 (u'2014-11-28', 3),
 (u'2014-12-01', 2),
 (u'2014-12-02', 2),
 (u'2014-12-04', 2),
 (u'2014-12-05', 1),
 (u'2014-12-06', 5),
 (u'2014-12-11', 1),
 (u'2014-12-15', 10)]

Notice that there are date gaps in this data set, indicating that the missing dates have a value of 0.

My working (non-Pandas) version of code looks like this:

from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()

This outputs this image:

Matplotlib output

Now, if I convert my query results to a Panda's dataframe

          Date  Count
0   2014-06-27      1
1   2014-06-29      3
2   2014-06-30      1
3   2014-07-01      1
4   2014-07-02      1
5   2014-07-09      1
6   2014-07-10      3
7   2014-07-11      1
8   2014-07-12      2
9   2014-07-14      1
10  2014-07-15      2
11  2014-07-17      3
12  2014-07-18      1
13  2014-07-20      1
14  2014-07-21      1
15  2014-07-23      2
16  2014-07-26      2
17  2014-07-27      2
18  2014-07-28      7
19  2014-07-29      3
20  2014-07-31      2
21  2014-08-01      1
22  2014-08-05      4
23  2014-08-07      2
24  2014-08-08      1
25  2014-08-13      1
26  2014-08-14      3
27  2014-08-15      1
28  2014-08-16      6
29  2014-08-17      1
30  2014-08-18      1
31  2014-08-20      1
32  2014-08-24      1
33  2014-08-25      3
34  2014-08-29      1
35  2014-08-30      1
36  2014-09-03      3
37  2014-09-13      1
38  2014-09-14      1
39  2014-09-24      3
40  2014-10-20      1
41  2014-10-24      1
42  2014-11-05      3
43  2014-11-09      1
44  2014-11-12      1
45  2014-11-13      1
46  2014-11-14      1
47  2014-11-18      1
48  2014-11-19      4
49  2014-11-22      1
50  2014-11-26      3
51  2014-11-28      3
52  2014-12-01      2
53  2014-12-02      2
54  2014-12-04      2
55  2014-12-05      1
56  2014-12-06      5
57  2014-12-11      1
58  2014-12-15     10

And utilize the simple Panda's wrapper to plot this:

plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()

I get this result. Notice that my missing days do not appear in this graph.

Pandas output

How do I readd the gaps (and 0 values) where my dates do not exist in the DataFrame?

The reason I want to utilize Pandas is to take advantage of some of it's other features (most importantly, a rolling average).

1 Answer 1

1

I wrote a working version, probably not the best but it will do the job. It is based on reindexing your original data into a DataFrame with a sample for everyday.

import pandas as pd
import matplotlib.pyplot as plt

#%% make data
df =  pd.DataFrame(date_cnts)
df.columns = ['Date', 'Count']

#%% make dataframe with everyday sampling
df.index = pd.to_datetime(df['Date'])
startdate = df.index[0]
enddate = df.index[-1]
df_new = df.reindex(pd.date_range(startdate, enddate, freq='1D'))

#%% plot the results
df_new['Count'].plot(kind='bar')

# decrease number of days 
new_xticks = plt.xticks()[0][1:-1:10]
plt.xticks(new_xticks)

enter image description here

For further formatting of the xticks I recommend this question: Pandas timeseries plot setting x-axis major and minor ticks and labels

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.