0

I have a pandas dataframe df:

    df = 
          category        str_column
           cat1            str1
           cat2            str2
            ...             ...

and a

   parent_df =
           category        parent_category
           cat1             parent_of_cat1
           cat2             parent_of_cat2

and a list

  faulty_categories =
     ['cat1', ...]

I need to get all the categories in df which are present in faulty_categories and replace them by their parent:

So, expected output:

       df = 

         category       str_column
         parent_of_cat1   str1
         cat2             str2

How to achieve this?

I tried:

       df.loc[df['category'].isin(faulty_categories), 'category'] = parents_df.loc[parents_df['category'].isin(faulty_categories), 'parent']

But, this replaced category by NaN.

3 Answers 3

1

You could use loc and map:

df.loc[df['category'].isin(faulty_categories), 'category'] = df['category'].map(parent_df.set_index('category')['parent_category'])

Or mask and map:

df['category'] = (df['category'].mask(df['category'].isin(faulty_categories),
                                      df['category'].map(parent_df.set_index('category')['parent_category'])
                                     )

Output:

         category str_column
0  parent_of_cat1       str1
1            cat2       str2
Sign up to request clarification or add additional context in comments.

Comments

1

Use df.merge and numpy.where:

In [2226]: import numpy as np

In [2227]: x = df.merge(parent_df)
In [2230]: x['category'] = np.where(x.category.isin(faulty_categories), x.parent_category, x.category)

In [2232]: x.drop('parent_category', 1, inplace=True)

In [2233]: x
Out[2233]: 
         category str_column
0  parent_of_cat1       str1
1            cat2       str2

Comments

1

df

    category    str_column
0   cat1         str1
1   cat2         str2
2   cat3         str3

    category    parent_category
0   cat1        parent_of_cat1
1   cat3        parent_of_cat3
2   cat2        parent_of_cat2

faulty_cat = ['cat1', 'cat3']

Solution:-

for i in range(len(df)):
    if df.loc[i,'category'] in faulty_cat:
        df.loc[i,'category'] = parent_df['parent_category'][parent_df['category'] == df.loc[i,'category']].item()

Output df:-

      category          str_column
0   parent_of_cat1      str1
1   cat2                str2
2   parent_of_cat3      str3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.