Create new column if DataFrame contains specific string

Question

I have one column in the DataFrame that is a name. Inside this name there are patterns that I want to locate, and create a category in other column of same DataFrame. For example :

Name 

name first RB LA a 
name LB second
RB name third
name LB fourth

I want the name with the same pattern to be in the same category, displayed in the other column

What I want :

       Name                  Example          

name first RB LA a          Round Blade category
name LB second              Long Biased category
RB name third               Round Blade category
name LB fourth              Long Biased category

I have a DataFrame, not a list, there are several other columns in it. And there are not only two categories, but several ones.

What I have Tried :

df.loc[df['Name']=="RB", 'Example'] = "RB category"

But it does not work since it must be an exact match

Another attempt :

if df[['Name'].str.contains("RB")] : 
    (...)

But it gives me error :

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to add to .bool() or .any(), but or the error persist or nothing happens when I run the line.

Thank you.

But it does not work since it must be an exact match Why not use .str.contains(), then? — AMC
– AMC, Commented Jan 8, 2020 at 22:28
The str.contains() does find the matches I am looking for, But I don't know how to use this function to create another column with the category I want, hence I came with this question. I tried through If conditional but it gave me an error as displayed above. — khouzam
– khouzam, Commented Jan 9, 2020 at 12:44
You need to use .loc[], like you did in the first snippet. — AMC
– AMC, Commented Jan 9, 2020 at 14:38

Hryhorii Pavlenko · Accepted Answer · 2020-01-08 20:00:26Z

5

You could use pandas.Series.str.extract to achieve the desired output

import numpy as np
import pandas as pd


df = pd.DataFrame({
    "Name": ["name first RB LA a", "name LB second", "RB name third", "name LB fourth"]
})
df["Example"] = df["Name"].str.extract("(LB|RB)")[0] + " category"

    Name                Example
0   name first RB LA a  RB category
1   name LB second      LB category
2   RB name third       RB category
3   name LB fourth      LB category

Edit

To change category names within Example column use .str.replace:

df["Example"] = (df["Example"]
 .str.replace("RB", "Round Blade")
 .str.replace("LB", "Long Biased")
)

edited Jan 8, 2020 at 20:00

answered Jan 8, 2020 at 19:52

Hryhorii Pavlenko

3,9104 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AMC Over a year ago

Isn't extracting and manipulating the strings more awkward than just having a function which checks the conditions and returns the correct category? This only works because the category name contains the extract string we're checking for, which seems quite fragile and hacky.

Hryhorii Pavlenko Over a year ago

@AMC, I thought str.extract was the easiest solution given khouzam's input data (exact matches). Sure, if there are no exact matches, str.contains would work better

Collectives™ on Stack Overflow

Create new column if DataFrame contains specific string

1 Answer 1

Edit

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related