0

This is how my dataframe called "emails" looks like (only one row with columns 'text' and 'POS_Tag'):

print(emails)

enter image description here

I'm trying to use apply() on my dataframe by first defining the function as:

 def extractGrammar(email):     
    tag_count_data = pd.DataFrame(email['POS_Tag'].map(lambda x: Counter(tag[1] for tag in x)).to_list())

    # Print count Part of speech tag needed for Adjective, Adverbs, Nouns and Verbs 
    email = pd.concat([email, tag_count_data], axis=1).fillna(0)

    pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
    for pos in pos_columns:
        if pos not in email.columns:
            email[pos] = 0

    email = email[['text'] + pos_columns]

    email['Adjectives'] = email['JJ'] + email['JJR'] + email['JJS']
    email['Adverbs'] = email['RB'] + email['RBR'] + email['RBS']
    email['Nouns'] = email['NN'] + email['NNS']
    email['Verbs'] = email['VB']  + email['VBS'] + email['VBG']  + email['VBN'] + email['VBP'] + email['VBZ'] 

    return email

And I have tried to pass my emails as an object with the apply() function as such:

emails = emails.apply(extractGrammar, axis=1)

I have just been getting this error:

AttributeError: 'list' object has no attribute 'map'

I have previously used the exact same block of code within the 'extractGrammar' function on CSV files with multiple rows of emails except it was used in a very manual and chronological way outside of a function where no apply was used. I cannot figure out what seemed to have gone wrong.

enter image description here

1

2 Answers 2

1

You get that result because when you apply() the extractGrammar() function to your DataFrame, it passes each row of the DataFrame to the function. Then when you access the ['POS Tag'] column, it is not returning that entire Series, but rather the contents of that POS Tag cell for that row, which is a list. Lists do not have a map method. If you are trying to count the occurrences of the second element of each tuple in the POS Tag column, you could try the following:

tag_count_data = Counter([x[1] for x in email['POS Tag']])

This will give you a Counter of the second elements of the tags for that individual row.

Sign up to request clarification or add additional context in comments.

Comments

0

In order to the df with the tags that I'd posted on the question and based on the kind guidance of LiamFiddler, I later on proceeded with:

  1. Turning Counter objects into a dict using dict()
  2. I turned dict into a Series,
  3. I set column values to be the column names based on this answer
  4. and then went on to select the tags that I need for my dataDrame.
def extractGrammar(email): 
   # Updated calculate the tags I need 
   tag_count_data = Counter([x[1] for x in email['POS_Tag']])
  
   #Convert the Counter object to dict
   tag_count_dict = dict(tag_count_data)

   #Turning dict into Series
   email_tag = pd.DataFrame(pd.Series(tag_count_dict).fillna(0).rename_axis('Tag'))
   email_tag = email_tag.reset_index()

   #use set_index to set Tag column values to be column names
   email_tag= email_tag.set_index("Tag").T.reset_index(drop=True).rename_axis(None, axis=1) 
   
   #select Tags that I need
   pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
   for pos in pos_columns:
     if pos not in email_tag.columns:
       email_tag[pos] = 0

   email_tag = email_tag[pos_columns] 

   return email_tag

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.