Pandas df loop + merge

Question

Hello guys I need your wisdom,

I'm still new to python and pandas and I'm looking to achieve the following thing.

df = pd.DataFrame({'code': [125, 265, 128,368,4682,12,26,12,36,46,1,2,1,3,6], 'parent': [12,26,12,36,46,1,2,1,3,6,'a','b','a','c','f'], 'name':['unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','g1','g2','g1','g3','g6']})

ds = pd.DataFrame({'code': [125, 265, 128,368,4682], 'name': ['Eagle','Cat','Koala','Panther','Dophin']})

I would like to add a new column in the ds dataframe with the name of the highest parent.

as an example for the first row :

code | name | category
125 | Eagle | a

"a" is the result of a loop between df.code and df.parent 125 > 12 > 1 > a

Since the last parent is not a number but a letter i think I must use a regex and than .merge from pandas to populate the ds['category'] column. Also maybe use an apply function but it seems a little bit above my current knowledge.

Could anyone help me with this?

Regards,

your df is not defined correctly (ValueError: arrays must all be same length) — SyntaxError
– SyntaxError, Commented Dec 18, 2020 at 11:31

Stef · Accepted Answer · 2020-12-18 12:30:33Z

1

The following is certainly not the fastest solution but it works if your dataframes are not too big. First create a dictionary from the parent codes of df and then apply this dict recursively until you come to an end.

p = df[['code','parent']].set_index('code').to_dict()['parent']

def get_parent(code):
    while par := p.get(code):
        code = par
    return code

ds['category'] = ds.code.apply(get_parent)

Result:

   code     name category
0   125    Eagle        a
1   265      Cat        b
2   128    Koala        a
3   368  Panther        c
4  4682   Dophin        f

PS: get_parent uses an assignment expression (Python >= 3.8), for older versions of Python you could use:

def get_parent(code):
    while True:
        par = p.get(code)
        if par:
            code = par
        else:
            return code

edited Dec 18, 2020 at 12:30

answered Dec 18, 2020 at 12:22

Stef

30.9k3 gold badges34 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MisterOo Over a year ago

thank you very much it works perfectly. I don't quite understand the while loop, so I have to look into it. Even if the second version seems easier to read ;)

Collectives™ on Stack Overflow

Pandas df loop + merge

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related