Scatterplot of pandas DataFrame ends in KeyError: 0

Question

After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:

import pandas as pd
import matplotlib.pyplot as plt


clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}

df_full = pd.DataFrame({'x':[20,30,30,40],
                        'y':[25,20,30,25],
                        's':[100,200,300,400],
                        'l':[1,2,3,4]})

df_full['c'] = df_full['l'].replace(clrdict)

df_part = df_full[(df_full.x == 30)]

fig = plt.figure()
plt.scatter(x=df_full['x'],
            y=df_full['y'],
            s=df_full['s'],
            c=df_full['c'])
plt.show()

fig = plt.figure()
plt.scatter(x=df_part['x'],
            y=df_part['y'],
            s=df_part['s'],
            c=df_part['c'])
plt.show()

The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:

Traceback (most recent call last):
  File "G:\data\project\test.py", line 27, in <module>
    c=df_part['c'])
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
    is not None else {}), **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
    return func(ax, *args, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
    isinstance(c[0], str))):
  File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

This is due to the color-option c=df_part['c']. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).

In my project the df_part = df_full[(df_full.x == i)] line is used within the update-function of a matplotlib.animation.FuncAnimation. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.

Thanks, ALollz. This solves the problem. Can you explain, why I have to call the values explicitly? — rhombuzz
– rhombuzz, Commented Nov 6, 2018 at 17:52
The issue is that pandas.Series have an index, so when plt.scatter tries to grab Series[0] it is looking for the row where the index = 0, not the first row of the Series. In your second case, this row doesn't exist, since your subset doesn't contain the first row. Using .values will convert your Series to an ndarray in which case ndarray[0] will give the first value in the array, regardless of whatever index the Series had. — ALollz
– ALollz, Commented Nov 6, 2018 at 18:00
I do understand. Thanks for taking your time to write this usefull explanation. — rhombuzz
– rhombuzz, Commented Nov 6, 2018 at 18:05

ImportanceOfBeingErnest · Accepted Answer · 2018-11-06 23:29:34Z

3

This is a bug which got fixed by https://github.com/matplotlib/matplotlib/pull/12673.

It should hopefully be available in the next bugfix release 3.0.2, which should be up within the next days.

In the meantime, you may use the numpy array from the pandas series, series.values.

edited Nov 6, 2018 at 23:29

answered Nov 6, 2018 at 18:01

ImportanceOfBeingErnest

342k61 gold badges737 silver badges771 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scatterplot of pandas DataFrame ends in KeyError: 0

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related