0

Why does difflib.get_close_matches throw the "list index out of range" Error when no matches are found in the following example?

from pandas import DataFrame
import difflib

df1 = DataFrame([[1,'034567','Foo'],
                 [2,'1cd2346','Bar']], 
                columns=['ID','Unit','Name'])
df2 = DataFrame([['SellTEST','0ab1234567'],
                 ['superVAR','1ab2345']], 
                columns=['Seller', 'Unit'])

df2['Unit'] = df2['Unit'].apply(lambda x: difflib.get_close_matches(x, df1['Unit'])[0])

df1.merge(df2)

I get that the value in df1 is way off - but I wouldn't expect this to error like it does, I would expect it to simply not match.

1
  • I think you are answering your own question... difflib is returning no close matches, which is an empty list. Then you blindly deference it, assuming there is a match, and there isn't. your lambda, instead of just deferencing [0], needs to check for a length first. What do you want to be there for no matches? Commented Apr 11, 2016 at 19:41

1 Answer 1

1

get_close_matches does simply not match - the list returned by difflib.get_close_matches is empty, and then you try and access the first element of it, which throws the IndexError.

If you wanted to replace an element where there are no matches with None, you could use this code instead, which utilises the fact that an empty list is falsey to replace a falsey value with None:

df2['Unit'] = df2['Unit'].apply(lambda x: (difflib.get_close_matches(x, df1['Unit'])[:1] or [None])[0])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.