1
df = pd.DataFrame({'a': ['Anakin Ana', 'Anakin Ana, Chris Cannon', 'Chris Cannon', 'Bella Bold'],
                   'b': ['Bella Bold, Chris Cannon', 'Donald Deakon', 'Bella Bold', 'Bella Bold'],
                   'c': ['Chris Cannon', 'Chris Cannon, Donald Deakon', 'Chris Cannon', 'Anakin Ana, Bella Bold']},
                   index=[0, 1, 2])

Hi everyone,

I'm trying to count how many names are in common in each column. Above is an example of what my data looks like. At first, it said 'float' object has no attribute 'split' error. I did some searching and it seems the error is coming from my missing data which is reading as float. But even when I change the column in string variable it keeps getting the error. Below is my code.


import pandas as pd
import csv
filepath = "C:/Users/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')
    
df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
df['overlap_count'] = df['word_overlap'].str.len()

df.to_csv('creditdata3.csv',mode='a',index=False) 

And here is the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-b85ac8637aae> in <module>
      4 df = pd.read_csv(filepath,encoding='utf-8')
      5 
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
      7 df['overlap_count'] = df['word_overlap'].str.len()
      8 

<ipython-input-21-b85ac8637aae> in <listcomp>(.0)
      4 df = pd.read_csv(filepath,encoding='utf-8')
      5 
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
      7 df['overlap_count'] = df['word_overlap'].str.len()
      8 

AttributeError: 'float' object has no attribute 'astype'
2
  • Can you more clearly define "how many names are in common in each column", or give an example of what the output should be? Commented Oct 29, 2021 at 23:17
  • Hi, so for example between the first cell of columns 1 and 2 there's no name that is in common so it would be 0. However the 4th cell of columns 1 and 2 has a common name 'Bella Bold' so it would be 1. Commented Oct 30, 2021 at 0:41

2 Answers 2

1

astype is a method in DataFrame, and here you have just a primitive float type, because you've already indexed x.

Try this:

df['word_overlap'] = [set(str(x[8]).split(",")) & set(str(x[10]).split(",")) for x in df.values]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much tromgy! now it doesn't have the error of float but the output is wacky..... The result of 'overlap_count' is only consists of 1 and 0....
0
import pandas as pd
import csv

filepath = "C:/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')


def f(columns):
    f_desc, f_def = str(columns[6]), str(columns[7])
    common = set(f_desc.split(",")).intersection(set(f_def.split(",")))
    return common, len(common)

df[['word_overlap', 'word_count']] = df.apply(f, axis=1, raw=True).apply(pd.Series)
df.to_csv('creditdata3.csv',mode='a',index=False)

I found another way to do it thank you, everyone!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.