1

I have DataFrame that look like this:

Cities        Cities_Dict
"San Francisco" ["San Francisco", "New York", "Boston"]
"Los Angeles"   ["Los Angeles"]
"berlin"        ["Munich", "Berlin"]
"Dubai"         ["Dubai"]

I want to create new column that compares city from firest column to the list of cities from secon column and finds the one that is the closest match. I use difflib for that:

df["new_col"]=difflib.get_close_matches(df["Cities"],df["Cities_Dict"])

However I get error:

TypeError: object of type 'float' has no len()

1 Answer 1

1

Use DataFrame.apply with lambda function and axis=1 for processing by rows:

import difflib, ast

#if necessary convert values to lists
#df['Cities_Dict'] = df['Cities_Dict'].apply(ast.literal_eval)

f = lambda x: difflib.get_close_matches(x["Cities"],x["Cities_Dict"])
df["new_col"] = df.apply(f, axis=1)
print (df)
          Cities                        Cities_Dict          new_col
0  San Francisco  [San Francisco, New York, Boston]  [San Francisco]
1    Los Angeles                      [Los Angeles]    [Los Angeles]
2         berlin                   [Munich, Berlin]         [Berlin]
3          Dubai                            [Dubai]          [Dubai]

EDIT:

For first value with empty string for empty list use:

f = lambda x: next(iter(difflib.get_close_matches(x["Cities"],x["Cities_Dict"])), '')
df["new_col"] = df.apply(f, axis=1)
print (df)
          Cities                        Cities_Dict        new_col
0  San Francisco  [San Francisco, New York, Boston]  San Francisco
1    Los Angeles                      [Los Angeles]    Los Angeles
2         berlin                   [Munich, Berlin]         Berlin
3          Dubai                            [Dubai]          Dubai

EDIT1: If possible problematic data is possible use try-except:

def f(x):
    try:
        return difflib.get_close_matches(x["Cities"],x["Cities_Dict"])[0]
    except:
        return ''

df["new_col"] = df.apply(f, axis=1)
print (df)
        Cities                        Cities_Dict new_col
0          NaN  [San Francisco, New York, Boston]        
1  Los Angeles                               [10]        
2       berlin                   [Munich, Berlin]  Berlin
3        Dubai                            [Dubai]   Dubai
Sign up to request clarification or add additional context in comments.

5 Comments

Is it possible to get result not as a list but as a string?
@AlexT - answer was edited - always return first value of list or empty string
I found out that some values in Cities_Dict ended up floats or ints, is it possible to use try, except in the lambda function that would skip those rows and produce empty string for them?
And second question why do you use next and iter?
@AlexT - for first, answer was edited. For second this is a trick - problem here is use selecting by [0] for first value of list, because if empty list it return error - like L = ['Dubai'] and L[0] working, but if L = [] then L[0] failed. And for prevent failed is used next with iter - it return first value of list, if exist (if not empty list) else default value, here empty string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.