0

I have the data file which looks like this -

[Table 1]
Terms         Author        Frequency
Hepatitis     Christopher   2
Acid          Subrata       1
Acid          Kal           3
Kinase        Pramod        31
Kinase        Steve         5
Kinase        Sharon        10
Acid          Rob           5
Acid          Christopher   2
Hepatitis     Sharon        3

which I want to convert in a frequency matrix like this -

Terms       Christopher     Subrata   Kal    Pramod     Steve    Sharon    Rob      
Hepatitis       2              0       0       0          0         3        0
Acid            2              0       3       0          0         0        5
Kinase          0              0       0       31         5         10       0

Now I have figured out how to do that and I am using this code for that -

 a = pd.read_csv("C:\\Users\\robert\\Desktop\\Python Project\\Publications Data\\New Merged Title Terms Corrected\\Python generated file\\Terms_Frequency_File.csv")
 b = a.groupby(['Terms']).apply(lambda x:x.set_index(['Terms','Author']).unstack()['Frequency'])   

and this worked absolutely fine till yesterday but today I generated the [Table 1] data again as I had to add one additional author to the data and trying to make a frequency matrix again like in [Table 2] but it's giving me this silly error -

KeyError: 'Terms'

I am pretty sure this has to do something with the index column in the dataframe or some white space issues in the index column(in this case 'Terms' column). I tried to read several answers on this like this - KeyError: 'column_name' and this - Key error when selecting columns in pandas dataframe after read_csv and tried those methods but these aren't helping.

Any help on this will be much appreciated! Thanks much!

7
  • What does print(a.columns) give you? Commented Jun 19, 2018 at 20:50
  • You should be using pivot table here. Try pd.pivot_table(df, index='Terms', columns='Author', values='Frequency', fill_value=0) In your code, Terms doesn't exist in the context you have selected when you try to set_index Commented Jun 19, 2018 at 20:57
  • You can also use crosstab here: pd.crosstab(df.Terms, df.Author, values=df.Frequency, aggfunc='sum').fillna(0) Commented Jun 19, 2018 at 21:03
  • @HarvIpan - It's giving me this - Index([''FINGER-LOOP'', 'Kukolj G', '1'], dtype='object') Commented Jun 19, 2018 at 21:04
  • 1
    Those are clearly the columns to a different dataframe than you have posted.. Commented Jun 19, 2018 at 21:06

1 Answer 1

1

I've got the same problem as you. I've observed that if I change the data in .csv format in OpenOffice program then the error occurs. Instead of that I've downloaded the data from the Internet and I edited the data in simple Notepad++ editor. Then it works normally. I know that perhaps this solution doesn't help in you case, but maybe you should change the text editor or program that supports .csv files.

Sign up to request clarification or add additional context in comments.

1 Comment

what if I am opening it in a dataframe?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.