After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:
import pandas as pd
import matplotlib.pyplot as plt
clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}
df_full = pd.DataFrame({'x':[20,30,30,40],
'y':[25,20,30,25],
's':[100,200,300,400],
'l':[1,2,3,4]})
df_full['c'] = df_full['l'].replace(clrdict)
df_part = df_full[(df_full.x == 30)]
fig = plt.figure()
plt.scatter(x=df_full['x'],
y=df_full['y'],
s=df_full['s'],
c=df_full['c'])
plt.show()
fig = plt.figure()
plt.scatter(x=df_part['x'],
y=df_part['y'],
s=df_part['s'],
c=df_part['c'])
plt.show()
The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:
Traceback (most recent call last):
File "G:\data\project\test.py", line 27, in <module>
c=df_part['c'])
File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
is not None else {}), **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
return func(ax, *args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
isinstance(c[0], str))):
File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
This is due to the color-option c=df_part['c']. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).
In my project the df_part = df_full[(df_full.x == i)] line is used within the update-function of a matplotlib.animation.FuncAnimation. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.
c=df_part['c'].valuesvaluesexplicitly?pandas.Serieshave an index, so whenplt.scattertries to grabSeries[0]it is looking for the row where theindex = 0, not the first row of theSeries. In your second case, this row doesn't exist, since your subset doesn't contain the first row. Using.valueswill convert yourSeriesto anndarrayin which casendarray[0]will give the first value in the array, regardless of whatever index theSerieshad.