adding column on dataframe using pandas library with conditions NaN

Question

currently working on python and newbie on it. I have a data frame consisting of two columns id and parent id

id   | parent
1    | A
2    | B
3    | C
4    | A
5    | A
6    | C
A    | NaN
B    | NaN
C    | NaN

And the expected output is like the table given below:

id   | parent | child
1    | A      | NaN
2    | B      | NaN
3    | C      | NaN
4    | A      | NaN
5    | A      | NaN
6    | C      | NaN
A    | NaN    | 1 ; 4 ; 5
B    | NaN    | 2 
C    | NaN    | 3 ; 6

I have tried using fillna() function on it but couldn't got expected results.

giser_yugang · Accepted Answer · 2017-05-30 09:16:40Z

1

I think you should use groupby and merge function on it.

print(df1)

  id parent
0  1      A
1  2      B
2  3      C
3  4      A
4  5      A
5  6      C
6  A    NaN
7  B    NaN
8  C    NaN

Then search their child:

df2 = df1.groupby('parent').agg({'id': lambda x: x.tolist()}).reset_index()
print(df2)

  parent      child
0      A  [1, 4, 5]
1      B        [2]
2      C     [3, 6]

finally merge them:

df2.columns = ['id', 'child']
df3 = pd.merge(df1,df2,on='id',how='left')
print(df3)
  id parent      child
0  1      A        NaN
1  2      B        NaN
2  3      C        NaN
3  4      A        NaN
4  5      A        NaN
5  6      C        NaN
6  A    NaN  [1, 4, 5]
7  B    NaN        [2]
8  C    NaN     [3, 6]

edited May 30, 2017 at 9:16

answered May 30, 2017 at 9:10

giser_yugang

6,1864 gold badges24 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

adding column on dataframe using pandas library with conditions NaN

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related