Hello guys I need your wisdom,
I'm still new to python and pandas and I'm looking to achieve the following thing.
df = pd.DataFrame({'code': [125, 265, 128,368,4682,12,26,12,36,46,1,2,1,3,6], 'parent': [12,26,12,36,46,1,2,1,3,6,'a','b','a','c','f'], 'name':['unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','unknow','g1','g2','g1','g3','g6']})
ds = pd.DataFrame({'code': [125, 265, 128,368,4682], 'name': ['Eagle','Cat','Koala','Panther','Dophin']})
I would like to add a new column in the ds dataframe with the name of the highest parent.
as an example for the first row :
code | name | category
125 | Eagle | a
"a" is the result of a loop between df.code and df.parent 125 > 12 > 1 > a
Since the last parent is not a number but a letter i think I must use a regex and than .merge from pandas to populate the ds['category'] column. Also maybe use an apply function but it seems a little bit above my current knowledge.
Could anyone help me with this?
Regards,
dfis not defined correctly (ValueError: arrays must all be same length)