Python KeyError: 'column name'

Question

I have the data file which looks like this -

[Table 1]
Terms         Author        Frequency
Hepatitis     Christopher   2
Acid          Subrata       1
Acid          Kal           3
Kinase        Pramod        31
Kinase        Steve         5
Kinase        Sharon        10
Acid          Rob           5
Acid          Christopher   2
Hepatitis     Sharon        3

which I want to convert in a frequency matrix like this -

Terms       Christopher     Subrata   Kal    Pramod     Steve    Sharon    Rob      
Hepatitis       2              0       0       0          0         3        0
Acid            2              0       3       0          0         0        5
Kinase          0              0       0       31         5         10       0

Now I have figured out how to do that and I am using this code for that -

 a = pd.read_csv("C:\\Users\\robert\\Desktop\\Python Project\\Publications Data\\New Merged Title Terms Corrected\\Python generated file\\Terms_Frequency_File.csv")
 b = a.groupby(['Terms']).apply(lambda x:x.set_index(['Terms','Author']).unstack()['Frequency'])

and this worked absolutely fine till yesterday but today I generated the [Table 1] data again as I had to add one additional author to the data and trying to make a frequency matrix again like in [Table 2] but it's giving me this silly error -

KeyError: 'Terms'

I am pretty sure this has to do something with the index column in the dataframe or some white space issues in the index column(in this case 'Terms' column). I tried to read several answers on this like this - KeyError: 'column_name' and this - Key error when selecting columns in pandas dataframe after read_csv and tried those methods but these aren't helping.

Any help on this will be much appreciated! Thanks much!

You should be using pivot table here. Try pd.pivot_table(df, index='Terms', columns='Author', values='Frequency', fill_value=0) In your code, Terms doesn't exist in the context you have selected when you try to set_index — user3483203
– user3483203, Commented Jun 19, 2018 at 20:57
You can also use crosstab here: pd.crosstab(df.Terms, df.Author, values=df.Frequency, aggfunc='sum').fillna(0) — user3483203
– user3483203, Commented Jun 19, 2018 at 21:03
@HarvIpan - It's giving me this - Index([''FINGER-LOOP'', 'Kukolj G', '1'], dtype='object') — spideypack
– spideypack, Commented Jun 19, 2018 at 21:04
Those are clearly the columns to a different dataframe than you have posted.. — user3483203
– user3483203, Commented Jun 19, 2018 at 21:06

Anna Olender · Accepted Answer · 2018-09-08 12:37:41Z

1

I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.

answered Sep 8, 2018 at 12:37

Anna Olender

111 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

DeshDeep Singh Over a year ago

what if I am opening it in a dataframe?

Collectives™ on Stack Overflow

Python KeyError: 'column name'

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related