I'm getting an error while using the FuzzyWuzzy library in Python 3. I'm working with CSV files also using the Pandas library.
I have the following data in my CSV file:
> BBL CorporationName CorporationName2
1 123 Elm St LLC 123 Elm St LLC
2 ABC Realty, INC ABC Realty, INC
3 123 Elm Street, LLC 123 Elm Street, LLC
4 ABC Realty Incorporated ABC Realty Incorporated
The CorporationName and CorporationName2 columns are actually the same. They each contain the names of real estate-related businesses. These names of theses businesses appear multiple times in each column, but as you can see, they sometimes appear in slightly different manifestations.
My goal is to take each string in CorporationName and compare it with all of the strings in CorporationName2. I would like then for FuzzyWuzzy to return the 5 most relevant strings from CorporationName2 (i.e. the possible variations of that name). This is just the first step in a massive string matching task I have subjected myself to.
> import pandas as pd
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
import csv
df = pd.read_csv('yescorp_fuzz.csv')
test_list = df.CorporationName
test_list1 = df.CorporationName1
def ownermatch():
for i in test_list:
result = process.extract(i,test_list1, limit=5)
print(result)
ownermatch()
This is the traceback error:
Traceback (most recent call last):
File "C:/Python34/YesCorpFuzzy4_15.py", line 17, in <module>
ownermatch()
File "C:/Python34/YesCorpFuzzy4_15.py", line 13, in ownermatch
result = process.extract(i,test_list1, limit=5)
File "C:\Python34\lib\site-packages\fuzzywuzzy\process.py", line 103, in extract
processed = processor(choice)
File "C:\Python34\lib\site-packages\fuzzywuzzy\utils.py", line 84, in full_process
string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)
File "C:\Python34\lib\site-packages\fuzzywuzzy\string_processing.py", line 25, in replace_non_letters_non_numbers_with_whitespace
return cls.regex.sub(u" ", a_string)
TypeError: expected string or buffer
>>>
To be perfectly honest, I'm not sure what's going on here. I couldn't find much on the internet, either.
Any help that you could provide would be greatly appreciated.
Thanks!