How to create a parent-child dataframe from an existing pandas dataframe of hierarchical structure and arbitrary shape?

Question

How do you convert a given dataframe with a hierarchical structure and arbitrary shape (say, similar to the one below) into a new dataframe with a parent and child column?

Edit: Note that a constraint is that a child cannot be its own parent.

data = {'level1': ['A', 'A', 'B', 'B', 'C'],
        'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
        'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
        'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
        }

df = pd.DataFrame(data)

  level1    level2     level3  level4
0      A     James  Stockholm     red
1      A    Robert     Denver  Denver
2      B  Patricia     Moscow  yellow
3      B  Patricia     Moscow  purple
4      C      John    Palermo    blue

Desired output is something like this:

       parent      child
0           A      James
1           A     Robert
2           B   Patricia
3           C       John
4       James  Stockholm
5      Robert     Denver
6    Patricia     Moscow
7        John    Palermo
8   Stockholm        red
9      Moscow     yellow
10     Moscow     purple
11    Palermo       blue

Can you elaborate on the problem here? It would be best if you could provide the desired output too. It might help us to provide you with better answers. — TheFaultInOurStars
– TheFaultInOurStars, Commented Mar 23, 2022 at 21:17
@AmirhosseinKiani, yes, of course. I added the desired output on your suggestion. — user2962397
– user2962397, Commented Mar 23, 2022 at 21:29
The dataframe you have shown with the name df differs from the one used in the data variable. Please keep that in mind when you look at my answer and output. — TheFaultInOurStars
– TheFaultInOurStars, Commented Mar 23, 2022 at 21:40

TheFaultInOurStars · Accepted Answer · 2022-03-23 23:12:53Z

2

What I can think of is using a for loop over the columns of the dataframe:

columns = df.columns
length = len(columns)
parent = []
children = []
for i in range(length):
  if i != length - 1 :
    parent += df[columns[i]].to_list()
    children += df[columns[i+1]].to_list()
newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
newDf[newDf["parent"] != newDf["children"]]

Output

       parent   children
0           A      James
1           A     Robert
2           B   Patricia
4           C       John
5       James  Stockholm
6      Robert     Denver
7    Patricia     Moscow
9        John    Palermo
10  Stockholm        red
12     Moscow     yellow
13     Moscow     purple
14    Palermo       blue

edited Mar 23, 2022 at 23:12

answered Mar 23, 2022 at 21:35

TheFaultInOurStars

3,6331 gold badge13 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2962397 Over a year ago

Row 11 in your output would be undesirable because a child can't be its own parent. What would be a way such that we exclude that type of row from the output?

TheFaultInOurStars Over a year ago

Thanks for your comment, @user2962397 . I have edited the answer as per your comment.

Collectives™ on Stack Overflow

How to create a parent-child dataframe from an existing pandas dataframe of hierarchical structure and arbitrary shape?

1 Answer 1

Output

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Output

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related