KeyError in for loop of dataframe in pandas

Question

I am putting my data into a bokeh layout of a heat map, but am getting a KeyError: '1'. It occurs right at the line num_calls = pivot_table[m][y] does anybody know why this would be?

The pivot table I am using is below:

pivot_table.head()
Out[101]: 
Month                            1    2    3    4    5    6    7    8    9   CompanyName                                                                   
Company 1                 182  270  278  314  180  152  110  127  129   
Company 2           163  147  192  142  186  231  214  130  112   
Company 3       126   88   99  139   97   97   96   37   79   
Company 4   84   89   71   95   80   89   83   88  104   
Company 5       91   96   94   66   81   77   87   83   68   

Month                            10   11   12  
CompanyName                                    
Company 1               117  127   81  
Company 2            117   93  101  
Company 3       116  111   95  
Company 4   93   78   64  
Company 5        83   95   65

Below is the section of code leading up to the error:

pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
months = ["1","2","3","4","5","6","7","8","9","10","11","12"]
pivot_table = pivot_table.set_index('CompanyName')

# this is the colormap from the original plot
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d" ]

# Set up the data for plotting. We will need to have values for every
# pair of year/month names. Map the rate to a color.
month = []
company = []
color = []
rate = []
for y in Companies:
    for m in months:
        month.append(m)
        company.append(y)
        num_calls = pivot_table[m][y]
        rate.append(num_calls)
        color.append(colors[min(int(num_calls)-2, 8)])

and upon request:

pivot_table.info()
<class 'pandas.core.frame.DataFrame'>
Index: 46 entries, Company1 to LastCompany
Data columns (total 12 columns):
1.0     46 non-null float64
2.0     46 non-null float64
3.0     46 non-null float64
4.0     46 non-null float64
5.0     46 non-null float64
6.0     46 non-null float64
7.0     46 non-null float64
8.0     46 non-null float64
9.0     46 non-null float64
10.0    46 non-null float64
11.0    46 non-null float64
12.0    46 non-null float64
dtypes: float64(12)
memory usage: 4.5+ KB

and

pivot_table.columns
Out[103]: Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='object')

Also the bokeh code is here: http://docs.bokeh.org/en/latest/docs/gallery/unemployment.html

Can you also show pivot_table.info() and pivot_table.columns? — joris
– joris, Commented Jul 1, 2015 at 20:10
Any chance you're mixing floats and strings, i.e. the column names are ints but m is a string? — JoeCondron
– JoeCondron, Commented Jul 1, 2015 at 20:12

Jianxun Li · Accepted Answer · 2015-07-01 20:34:35Z

I've tried the following code and it works on my PC. I use .loc with the aim to avoid potential key error.

import pandas as pd
import numpy as np

# just following your previous post to simulate your data
np.random.seed(0)
dates = np.random.choice(pd.date_range('2015-01-01 00:00:00', '2015-06-30 00:00:00', freq='1h'), 10000)
company = np.random.choice(['company' + x for x in '1 2 3 4 5'.split()], 10000)
df = pd.DataFrame(dict(recvd_dttm=dates, CompanyName=company)).set_index('recvd_dttm').sort_index()
df['C'] = 1
df.columns = ['CompanyName', '']
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg({df.columns[1]: sum}).reset_index()
result.columns = ['Month', 'CompanyName', 'counts']
pivot_table = result.pivot(index='CompanyName', columns='Month', values='counts')




colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d" ]

month = []
company = []
color = []
rate = []
for y in pivot_table.index:
    for m in pivot_table.columns:
        month.append(m)
        company.append(y)
        num_calls = pivot_table.loc[y, m]
        rate.append(num_calls)
        color.append(colors[min(int(num_calls)-2, 8)])

JoeCondron · Accepted Answer · 2015-07-01 20:36:21Z

1

Try changing the loop to

for m in pivot_table.columns:

It seems you can achieve the same thing without any loops though. You're looping through the row index and column index to access each entry individually and appending them to a list, so rate is just a list of all elements in the data frame. You can achieve this by

rate= pivot_table.stack().astype(int).tolist()
color = [colours[min(x - 2, 8)] for x in rate]

Am i missing something here?

edited Jul 1, 2015 at 20:36

answered Jul 1, 2015 at 20:18

JoeCondron

8,9163 gold badges29 silver badges28 bronze badges

3 Comments

jenryb Over a year ago

Interesting. Now I am getting this error: RuntimeError: Column name 'Companies' does not appear in data source <bokeh.models.sources.ColumnDataSource object at 0x0F6E9710>

jenryb Over a year ago

Without looping is an interesting approach. I'll be honest, I'm pulling straight from the bokeh code and trying to put my own data in. I tried your two lines and got this: color = [colors[min(x - 2, 8)] for x in num_calls] TypeError: 'numpy.float64' object is not iterable

JoeCondron Over a year ago

Sorry I had num_calls instead of rate. I've edited the response. Try it now. You should always try to avoid loops in pandas

Collectives™ on Stack Overflow

KeyError in for loop of dataframe in pandas

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related