How do I create a Matplotlib stackplot with Sparse Data?

Question

I have a time-sampled data set with essentially a two-column index (timestamp, ID). However, some timestamps do not have a sample point for a given index.

How can I make a stackplot with Matplotlib for this kind of data?

import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

df = pd.read_csv(io.StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))

b = np.unique(df.B)
plt.stackplot(np.unique(df.A),
              [df[df.B==_b].C for _b in b],
              labels=['B:{0}'.format(_b) for _b in b],
)
plt.xlabel('A')
plt.ylabel('C')
plt.legend(loc='upper left')
plt.show()

When I try this program, Python replies:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

When I manually fill in the missing data points (see blank lines in string literal), the plot works fine.

Is there a straightforward way to "insert" zero records for missing sample data (like this question, but I have two columns functioning as indices, and I don't know how to adapt the solution to my problem) or have Matplotlib plot with holes?

unutbu · Accepted Answer · 2015-12-04 20:02:28Z

You could use df.pivot to massage the DataFrame into a form amenable to calling DataFrame.plot(kind='area'). For example, if

In [46]: df
Out[46]: 
   A  B     C
0  1  1  0.00
1  1  2  0.00
2  1  3  0.00
3  1  4  0.00
4  2  1  0.50
5  2  2  0.20
6  2  4  0.15
7  3  1  0.70
8  3  3  0.10
9  3  4  0.20

then

In [47]: df.pivot(columns='B', index='A')
Out[47]: 
     C                
B    1    2    3     4
A                     
1  0.0  0.0  0.0  0.00
2  0.5  0.2  NaN  0.15
3  0.7  NaN  0.1  0.20

Notice that df.pivot fills in the missing NaN values for you. Now, with the DataFrame in this form,

result.plot(kind='area')

produces the desired plot.

import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

try:
    # for Python2
    from cStringIO import StringIO 
except ImportError:
    # for Python3
    from io import StringIO


df = pd.read_csv(StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))


result = df.pivot(columns='B', index='A')
result.columns = result.columns.droplevel(0)
# Alternatively, the above two lines are equivalent to
# result = df.set_index(['A','B'])['C'].unstack('B')

ax = result.plot(kind='area')
lines, labels = ax.get_legend_handles_labels()
ax.set_ylabel('C')
ax.legend(lines, ['B:{0}'.format(b) for b in result.columns], loc='best')

plt.show()

yields

Collectives™ on Stack Overflow

How do I create a Matplotlib stackplot with Sparse Data?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related