python dataframe create one column based on another column

Question

I would like to create another column in a dataframe.

The dataframe is like the following, sub_id is part of the id, say id is the 'parent' for sub_id, it includes id itself and some items included in id.

id has no name but sub_id has corresponding name

I would like to check id with sub_id's name, and then create id's name

df = pd.DataFrame({'id':[1,1,1,2,2],
                    'sub_id':[12,1,13,23,2],
                    'name':['pear','fruit','orange','cat','animal']})
   id  sub_id    name
0   1      12    pear
1   1       1   fruit
2   1      13  orange
3   2      23     cat
4   2       2  animal

I would like to create another column id_name, to get:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

I have no idea how it could be achieved efficiently, I only think of to merge the dataframe twice, but I think there is a better way.

jezrael · Accepted Answer · 2022-05-03 09:57:28Z

2

If replace not matched id with sub_id to misisng values in Series.where then GroupBy.transform with first working, because return first non missing values:

df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
                           .groupby(df['id'])
                           .transform('first'))

Or filter rows by mask and mapping helper Series by Series.map:

s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

Details:

print (df['name'].where(df['id'].eq(df['sub_id'])))
0       NaN
1     fruit
2       NaN
3       NaN
4    animal
Name: name, dtype: object


print (s)
id
1     fruit
2    animal
Name: name, dtype: object

edited May 3, 2022 at 9:57

answered May 3, 2022 at 9:50

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-05-03 09:54:58Z

1

Are your IDs unique?

You use GroupBy.transform to get the min id per group and map this to the existing id:

df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
                   .map(df.set_index('sub_id')['name'])
                )

output:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

answered May 3, 2022 at 9:54

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

python dataframe create one column based on another column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related