1

I have a df like this:

Allotment    C1     C2    
Annex        1.0    2.0   
Arnstson     1.6    1.4   
Berg         2.1    4.5   
Bjugstad     6.7    6.9   

and I am making a scatter plot of C1 and C2 while labeling with the associated Allotment. I am doing this like this:

a=df.C1
b=df.C2
n=df.Allotment

    with PdfPages(r'C:\plot.pdf') as pdf: 
            plt.title('PC1 vs. PC2 Scatterplot')
            plt.xlabel('PC1')
            plt.ylabel('PC2')
            plt.scatter(a,b, facecolors='none', s=20, edgecolors='b')
            # use this portion to annotate each point
            for i, txt in enumerate(n):
                plt.annotate(txt, (a[i],b[i]), fontsize=2.5)
            fig=plt.gcf()
            pdf.savefig(fig)
            plt.show()

but when I implement this line to remove Allotments:

df=df[~df['Allotments'] .isin (['Berg', 'Annex'])]

and run the same code I get the following error:

Traceback (most recent call last):

  File "<ipython-input-58-c5ce20451164>", line 1, in <module>
    runfile('H:/python codes/PC_scatterplots.py', wdir='H:/python codes')

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
    execfile(filename, namespace)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "H:/python codes/PC_scatterplots.py", line 64, in <module>
    plt.annotate(txt, (a[i],b[i]), fontsize=2.5)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.py", line 521, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Users\spotter\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\index.py", line 1595, in get_value
    return self._engine.get_value(s, k)

  File "pandas\index.pyx", line 100, in pandas.index.IndexEngine.get_value (pandas\index.c:3113)

  File "pandas\index.pyx", line 108, in pandas.index.IndexEngine.get_value (pandas\index.c:2844)

  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)

  File "pandas\hashtable.pyx", line 375, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7224)

  File "pandas\hashtable.pyx", line 381, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:7162)

KeyError: 9L

1 Answer 1

2

When you enumerate, i is a newly initiated index starting at 0, while a and b retain the indices from the df.

In [83] df

    Allotment   C1  C2
0   Annex       1.0     2.0
1   Arnston     1.6     1.4
2   Berg        2.1     4.5
3   Bjugstad    6.7     6.9

In [84]: a=df.C1
         b=df.C2
         n=df.Allotment

In [85]: for i, txt in enumerate(n):
            print i,txt
0 Annex
1 Arnston
2 Berg
3 Bjugstad

But when you assign df to the subset, the original indices remain.

df=df[~df['Allotment'].isin (['Berg', 'Annex'])]

a=df.C1
b=df.C2
n=df.Allotment

In [86]: a
Out[86]:
    1    1.6
    3    6.7

In [87]: for i, txt in enumerate(n):
            print i,txt
            print a[i] #doesn't exist

Replicates similar error to yours:

0 Arnston

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-23-b79737b511ee> in <module>()
      1 for i, txt in enumerate(n):
      2     print i,txt
----> 3     print a[i]

/home/kevin/anaconda2/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    549     def __getitem__(self, key):
    550         try:
--> 551             result = self.index.get_value(self, key)
    552 
    553             if not np.isscalar(result):

/home/kevin/anaconda2/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1721 
   1722         try:
-> 1723             return self._engine.get_value(s, k)
   1724         except KeyError as e1:
   1725             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3204)()

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:2903)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()

pandas/hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6525)()

pandas/hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6463)()

KeyError: 0

So you could use .iloc thx Jezzamon, it plots correctly.

for i, txt in enumerate(n):
    print a.iloc[i]
1.6
6.7
Sign up to request clarification or add additional context in comments.

2 Comments

so the solution is to use iloc instead?
See addition Jezzamon, what do you think?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.