1

I've a simple column of strings, and a list of strings.

strings_col
"the cat is on the table"
"the dog is eating"

list1 = ["cat", "table", "dog"]

I need to create another column in which every row contains the string contained in the list if they are in the string_col, if it contains two or more strings from the list, then I'd like to have more rows. The result should be something like this:

 strings_col                   string
"the cat is on the table"      cat
"the cat is on the table"      table
"the dog is eating"            dog

How can I do that? thanks

0

2 Answers 2

3

You can use str.findall:

>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')

                 strings_col string
0  "the cat is on the table"    cat
0  "the cat is on the table"  table
1        "the dog is eating"    dog

If you want you can reset_index after that:

>>> df.assign(
        string=df.strings_col.str.findall(r'|'.join(list1))
    ).explode('string').reset_index(drop=True)
                 strings_col string
0  "the cat is on the table"    cat
1  "the cat is on the table"  table
2        "the dog is eating"    dog
Sign up to request clarification or add additional context in comments.

Comments

3

try str.extractall, .groupby.agg(list) & .explode()

pat = '|'.join(list1)
# 'cat|table|dog'


df['matches'] = df['strings_col']\
                 .str.extractall(f"({pat})")\
                 .groupby(level=0).agg(list)

df_new = df.explode('matches')
print(df_new)
    
               strings_col matches
0  the cat is on the table     cat
0  the cat is on the table   table
1       the dog is eating      dog

1 Comment

OP requested 1 matched word per row.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.