3

I am putting my data into a bokeh layout of a heat map, but am getting a KeyError: '1'. It occurs right at the line num_calls = pivot_table[m][y] does anybody know why this would be?

The pivot table I am using is below:

pivot_table.head()
Out[101]: 
Month                            1    2    3    4    5    6    7    8    9   CompanyName                                                                   
Company 1                 182  270  278  314  180  152  110  127  129   
Company 2           163  147  192  142  186  231  214  130  112   
Company 3       126   88   99  139   97   97   96   37   79   
Company 4   84   89   71   95   80   89   83   88  104   
Company 5       91   96   94   66   81   77   87   83   68   

Month                            10   11   12  
CompanyName                                    
Company 1               117  127   81  
Company 2            117   93  101  
Company 3       116  111   95  
Company 4   93   78   64  
Company 5        83   95   65  

Below is the section of code leading up to the error:

pivot_table = pivot_table.reset_index()
pivot_table['CompanyName'] = [str(x) for x in pivot_table['CompanyName']]
Companies = list(pivot_table['CompanyName'])
months = ["1","2","3","4","5","6","7","8","9","10","11","12"]
pivot_table = pivot_table.set_index('CompanyName')

# this is the colormap from the original plot
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d" ]

# Set up the data for plotting. We will need to have values for every
# pair of year/month names. Map the rate to a color.
month = []
company = []
color = []
rate = []
for y in Companies:
    for m in months:
        month.append(m)
        company.append(y)
        num_calls = pivot_table[m][y]
        rate.append(num_calls)
        color.append(colors[min(int(num_calls)-2, 8)])

and upon request:

pivot_table.info()
<class 'pandas.core.frame.DataFrame'>
Index: 46 entries, Company1 to LastCompany
Data columns (total 12 columns):
1.0     46 non-null float64
2.0     46 non-null float64
3.0     46 non-null float64
4.0     46 non-null float64
5.0     46 non-null float64
6.0     46 non-null float64
7.0     46 non-null float64
8.0     46 non-null float64
9.0     46 non-null float64
10.0    46 non-null float64
11.0    46 non-null float64
12.0    46 non-null float64
dtypes: float64(12)
memory usage: 4.5+ KB

and

pivot_table.columns
Out[103]: Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='object')

Also the bokeh code is here: http://docs.bokeh.org/en/latest/docs/gallery/unemployment.html

4
  • Can you show how your dataframe looks like? Commented Jul 1, 2015 at 20:02
  • @AnandSKumar Yes, added in an edit. Commented Jul 1, 2015 at 20:08
  • Can you also show pivot_table.info() and pivot_table.columns? Commented Jul 1, 2015 at 20:10
  • Any chance you're mixing floats and strings, i.e. the column names are ints but m is a string? Commented Jul 1, 2015 at 20:12

2 Answers 2

1

I've tried the following code and it works on my PC. I use .loc with the aim to avoid potential key error.

import pandas as pd
import numpy as np

# just following your previous post to simulate your data
np.random.seed(0)
dates = np.random.choice(pd.date_range('2015-01-01 00:00:00', '2015-06-30 00:00:00', freq='1h'), 10000)
company = np.random.choice(['company' + x for x in '1 2 3 4 5'.split()], 10000)
df = pd.DataFrame(dict(recvd_dttm=dates, CompanyName=company)).set_index('recvd_dttm').sort_index()
df['C'] = 1
df.columns = ['CompanyName', '']
result = df.groupby([lambda idx: idx.month, 'CompanyName']).agg({df.columns[1]: sum}).reset_index()
result.columns = ['Month', 'CompanyName', 'counts']
pivot_table = result.pivot(index='CompanyName', columns='Month', values='counts')




colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d" ]

month = []
company = []
color = []
rate = []
for y in pivot_table.index:
    for m in pivot_table.columns:
        month.append(m)
        company.append(y)
        num_calls = pivot_table.loc[y, m]
        rate.append(num_calls)
        color.append(colors[min(int(num_calls)-2, 8)])
Sign up to request clarification or add additional context in comments.

Comments

1

Try changing the loop to

for m in pivot_table.columns:

It seems you can achieve the same thing without any loops though. You're looping through the row index and column index to access each entry individually and appending them to a list, so rate is just a list of all elements in the data frame. You can achieve this by

rate= pivot_table.stack().astype(int).tolist()
color = [colours[min(x - 2, 8)] for x in rate]

Am i missing something here?

3 Comments

Interesting. Now I am getting this error: RuntimeError: Column name 'Companies' does not appear in data source <bokeh.models.sources.ColumnDataSource object at 0x0F6E9710>
Without looping is an interesting approach. I'll be honest, I'm pulling straight from the bokeh code and trying to put my own data in. I tried your two lines and got this: color = [colors[min(x - 2, 8)] for x in num_calls] TypeError: 'numpy.float64' object is not iterable
Sorry I had num_calls instead of rate. I've edited the response. Try it now. You should always try to avoid loops in pandas

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.