3

I would like to create another column in a dataframe.

The dataframe is like the following, sub_id is part of the id, say id is the 'parent' for sub_id, it includes id itself and some items included in id.

id has no name but sub_id has corresponding name

I would like to check id with sub_id's name, and then create id's name

df = pd.DataFrame({'id':[1,1,1,2,2],
                    'sub_id':[12,1,13,23,2],
                    'name':['pear','fruit','orange','cat','animal']})
   id  sub_id    name
0   1      12    pear
1   1       1   fruit
2   1      13  orange
3   2      23     cat
4   2       2  animal

I would like to create another column id_name, to get:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

I have no idea how it could be achieved efficiently, I only think of to merge the dataframe twice, but I think there is a better way.

2 Answers 2

2

If replace not matched id with sub_id to misisng values in Series.where then GroupBy.transform with first working, because return first non missing values:

df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
                           .groupby(df['id'])
                           .transform('first'))

Or filter rows by mask and mapping helper Series by Series.map:

s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

Details:

print (df['name'].where(df['id'].eq(df['sub_id'])))
0       NaN
1     fruit
2       NaN
3       NaN
4    animal
Name: name, dtype: object


print (s)
id
1     fruit
2    animal
Name: name, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

Are your IDs unique?

You use GroupBy.transform to get the min id per group and map this to the existing id:

df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
                   .map(df.set_index('sub_id')['name'])
                )

output:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.