1

I have one column in the DataFrame that is a name. Inside this name there are patterns that I want to locate, and create a category in other column of same DataFrame. For example :

Name 

name first RB LA a 
name LB second
RB name third
name LB fourth 

I want the name with the same pattern to be in the same category, displayed in the other column

What I want :

       Name                  Example          

name first RB LA a          Round Blade category
name LB second              Long Biased category
RB name third               Round Blade category
name LB fourth              Long Biased category

I have a DataFrame, not a list, there are several other columns in it. And there are not only two categories, but several ones.

What I have Tried :

df.loc[df['Name']=="RB", 'Example'] = "RB category"

But it does not work since it must be an exact match

Another attempt :

if df[['Name'].str.contains("RB")] : 
    (...)

But it gives me error :

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to add to .bool() or .any(), but or the error persist or nothing happens when I run the line.

Thank you.

3
  • But it does not work since it must be an exact match Why not use .str.contains(), then? Commented Jan 8, 2020 at 22:28
  • The str.contains() does find the matches I am looking for, But I don't know how to use this function to create another column with the category I want, hence I came with this question. I tried through If conditional but it gave me an error as displayed above. Commented Jan 9, 2020 at 12:44
  • You need to use .loc[], like you did in the first snippet. Commented Jan 9, 2020 at 14:38

1 Answer 1

5

You could use pandas.Series.str.extract to achieve the desired output


import numpy as np
import pandas as pd


df = pd.DataFrame({
    "Name": ["name first RB LA a", "name LB second", "RB name third", "name LB fourth"]
})
df["Example"] = df["Name"].str.extract("(LB|RB)")[0] + " category"

    Name                Example
0   name first RB LA a  RB category
1   name LB second      LB category
2   RB name third       RB category
3   name LB fourth      LB category

Edit

To change category names within Example column use .str.replace:

df["Example"] = (df["Example"]
 .str.replace("RB", "Round Blade")
 .str.replace("LB", "Long Biased")
)
Sign up to request clarification or add additional context in comments.

2 Comments

Isn't extracting and manipulating the strings more awkward than just having a function which checks the conditions and returns the correct category? This only works because the category name contains the extract string we're checking for, which seems quite fragile and hacky.
@AMC, I thought str.extract was the easiest solution given khouzam's input data (exact matches). Sure, if there are no exact matches, str.contains would work better

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.