Extract multiple string groups in same column pandas

Question

I have the following DataFrame:

test = {'title': ['Undeclared milk in Burnbrae', 'Undeclared milk in certain Bumble', 'Certain cheese products may contain listeria', 'Ocean brand recalled due to Salmonella', 'IQF Raspberries due to Listeria']}
example = pd.DataFrame(test)
example

    title
0   Undeclared milk in Burnbrae
1   Undeclared milk in certain Bumble
2   Certain cheese products may contain listeria
3   Ocean brand recalled due to Salmonella
4   IQF Raspberries due to Listeria

And I want to extract the following strings in the same column. I want my result to look like this:

test = {'hazard': ['Undeclared milk', 'Undeclared milk', 'listeria', 'Salmonella', 'Listeria'], 'title': ['Undeclared milk in Burnbrae', 'Undeclared milk in certain Bumble', 'Certain cheese products may contain listeria', 'Ocean brand recalled due to Salmonella', 'IQF Raspberries due to Listeria']}
example2 = pd.DataFrame(test)
example2

     hazard          title
0   Undeclared milk Undeclared milk in Burnbrae
1   Undeclared milk Undeclared milk in certain Bumble
2   listeria        Certain cheese products may contain listeria
3   Salmonella      Ocean brand recalled due to Salmonella
4   Listeria        IQF Raspberries due to Listeria

Essentially my separators are in, may contain and due to


example['hazard'] = example['title'].str.extract(r'^(.*?) in\b')
example['hazard'] = example['title'].str.extract(r'\b may contain (.*)$')
example['hazard'] = example['title'].str.extract(r'\b due to (.*)$')

I wrote the code above to test each separator but would like to extract all in the same column.

How can I do this?

I appreciate all the help

Cameron Riddell · Accepted Answer · 2021-02-24 07:00:49Z

3

You can join your seperators into list, and join them via "|".join to transform this into a larger pattern. From there, Series.str.extract can get all of the matches, and we reshape to match the original size.

seperators = [r"^(.*?) in\b", r"\b may contain (.*)$", r"\b due to (.*)$"]
sep_pattern = r"|".join(seperators)

example["hazard"] = (example["title"].str.extract(sep_pattern)
                       .stack()
                       .droplevel(1))

print(example)
                                          title           hazard
0                   Undeclared milk in Burnbrae  Undeclared milk
1             Undeclared milk in certain Bumble  Undeclared milk
2  Certain cheese products may contain listeria         listeria
3        Ocean brand recalled due to Salmonella       Salmonella
4               IQF Raspberries due to Listeria         Listeria

answered Feb 24, 2021 at 7:00

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Alejandro L Over a year ago

It's returning a null value in all rows because it is not joining the "| " to the list of strings. So it;s not recognizing the "or" condition.

Cameron Riddell Over a year ago

Just re-tested this snippet and it still works for me pandas 1.1.5 Not sure why it's giving you nulls.

Alejandro L Over a year ago

I updated my pandas version and it works now, weird. Do you know why instead of writing example['title'].str.extract(r'^(.*?) in\\b|\\b may contain (.*)$|\\b due to (.*)$') it doesn't work? Thank you for the help!

supercooler8 · Accepted Answer · 2021-02-24 08:23:26Z

2

A more first principles approach that gets the same outcome:

def func(s: str):
    check1 = re.search(r'^(.*?) in\b',s)
    check2 = re.search(r'\b may contain (.*)$',s)
    check3 = re.search(r'\b due to (.*)$',s)
    if check1:
        return check1.group(1)
    elif check2:
        return check2.group(1)
    elif check3:
        return check3.group(1)
    else:
        return np.nan

example["hazard"] = example["title"].apply(func)

answered Feb 24, 2021 at 8:23

supercooler8

5032 silver badges7 bronze badges

Collectives™ on Stack Overflow

Extract multiple string groups in same column pandas

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related