0

How do you convert a given dataframe with a hierarchical structure and arbitrary shape (say, similar to the one below) into a new dataframe with a parent and child column?

Edit: Note that a constraint is that a child cannot be its own parent.

data = {'level1': ['A', 'A', 'B', 'B', 'C'],
        'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
        'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
        'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
        }

df = pd.DataFrame(data)

  level1    level2     level3  level4
0      A     James  Stockholm     red
1      A    Robert     Denver  Denver
2      B  Patricia     Moscow  yellow
3      B  Patricia     Moscow  purple
4      C      John    Palermo    blue

Desired output is something like this:

       parent      child
0           A      James
1           A     Robert
2           B   Patricia
3           C       John
4       James  Stockholm
5      Robert     Denver
6    Patricia     Moscow
7        John    Palermo
8   Stockholm        red
9      Moscow     yellow
10     Moscow     purple
11    Palermo       blue
4
  • Can you elaborate on the problem here? It would be best if you could provide the desired output too. It might help us to provide you with better answers. Commented Mar 23, 2022 at 21:17
  • @AmirhosseinKiani, yes, of course. I added the desired output on your suggestion. Commented Mar 23, 2022 at 21:29
  • The dataframe you have shown with the name df differs from the one used in the data variable. Please keep that in mind when you look at my answer and output. Commented Mar 23, 2022 at 21:40
  • 1
    @AmirhosseinKiani Yes, you're right. Revised. Thanks. Commented Mar 23, 2022 at 21:55

1 Answer 1

2

What I can think of is using a for loop over the columns of the dataframe:

columns = df.columns
length = len(columns)
parent = []
children = []
for i in range(length):
  if i != length - 1 :
    parent += df[columns[i]].to_list()
    children += df[columns[i+1]].to_list()
newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
newDf[newDf["parent"] != newDf["children"]]

Output

       parent   children
0           A      James
1           A     Robert
2           B   Patricia
4           C       John
5       James  Stockholm
6      Robert     Denver
7    Patricia     Moscow
9        John    Palermo
10  Stockholm        red
12     Moscow     yellow
13     Moscow     purple
14    Palermo       blue
Sign up to request clarification or add additional context in comments.

2 Comments

Row 11 in your output would be undesirable because a child can't be its own parent. What would be a way such that we exclude that type of row from the output?
Thanks for your comment, @user2962397 . I have edited the answer as per your comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.