0

I have a pandas dataframe like this:

data={
    'col1':['New Zealand', 'Gym', 'United States'],
    'col2':['Republic of South Africa', 'Park', 'United States of America'],
}
df=pd.DataFrame(data)
print(df)

            col1                      col2
0    New Zealand  Republic of South Africa
1            Gym                      Park
2  United States  United States of America

And I have a sentence that might contain words from any of the columns of the dataframe. I want to get the values in columns that are present in the sentence given and in which column they are. I have seen some similar solutions but they match the sentence given with the column values and not the other way around. Currently, I am doing it like this:

def find_match(df,sentence):
    "returns true/false depending on the matching value and column name where the value exists"
    arr=[]
    cols=[]
    flag=False
    for i,row in df.iterrows():
        if row['col1'].lower() in sentence.lower():
            arr.append(row['col1'])
            cols.append('col1')
            flag=True
        elif row['col2'].lower() in sentence.lower():
            arr.append(row['col2'])
            cols.append('col2')
            flag=True
    return flag,arr,cols

sentence="I live in the United States"
find_match(df,sentence)  # returns (True, ['United States'], ['col1'])

I want a more pythonic way to do this because it is consuming lots of time on quite a large dataframe and it doesn't seem pythonic to me.

I cannot use .isin() because it wants a list of strings and matches the column value with the whole sentence given. I have tried doing the following as well but it throws error:

df.loc[df['col1'].str.lower() in sentence]  # throws error that df['col1'] should be a string

Any help will be highly appreciated. Thanks!

2 Answers 2

1

I would do something something like this:

def find_match(df,sentence):
    ids = [(i,j) for j in df.columns for i,v in enumerate(df[j]) if v.lower() in sentence.lower()]
    return len(ids)>0, [df[id[1]][id[0]] for id in ids], [id[1] for id in ids]

Which gives:

find_match(df, sentence = 'I regularly go to the gym in the United States of America')

(True,
 ['Gym', 'United States', 'United States of America'],
 ['col1', 'col1', 'col2'])

From my feeling this is quite pythonic although there might be more elegant ways making more use of pandas functions.

Sign up to request clarification or add additional context in comments.

3 Comments

great, but it is not giving the exact string that is matched, right?
So you want you solution to contain only one value for each row? Thus a sentence containing gym and park should only return the position for gym?
I just edited the answer such that strings instead of row ids are printed
0

Evidently you would like to check each value in Col 1 is a sub-string of the sentence. Is this correct? If so, here is one way:

df = pd.DataFrame(
    {'col1': ['New Zealand', 'Gym', 'United States'],
    'col2': ['Republic of South Africa', 'Park', 'United States of America']})

sentence = 'I live in the United States'

mask = df['col1'].apply(lambda x: x in sentence) # `mask` is a boolean array

if mask.any():
    matches = df.loc[mask, 'col1']
    print(mask.any(), end=', ')
    print(df.loc[mask, 'col1'].values, end=', ')
    print('col1')
    print()

# the print statements produce the following line
# True, ['United States'], col1

If this is the right logic for one column, then you could put the mask statement and the if clause in a loop for col in df.columns:

Update: we can modify the lambda expression to perform case-insensitive comparison. (The original data frame is not changed.)

mask = df['col1'].apply(lambda x: x.lower() in sentence.lower())

2 Comments

it is not returning the matches if the value is in lower-case. Can you kindly tell me how to fix that? Apparently x.lower() is not working while masking.
I added to_lower() to the lambda expression, which worked for my example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.