1

I have 2 pandas dataframes (df1, df2) that I am trying to extract data from and create a 3rd dataframe (df3)

df1 has 2 columns (an id column and another column that hold the name of the columns in the 2nd dataframe (df2)

df1 looks like:
===============
id1      name
---      ----
1        df2_column1_name
5        df2_column1_name
33       df2_column3_name
...
... and so on

df2 looks like:
===============
id2  df2_column1_name   df2_column2_name   df2_column2_name .... and so on
---  ----------------   ----------------   ----------------
12   Jimmy              male               25               .... 
16   Becky              female             30               ....
75   Mike               male               80               ....
....
.... and so on


I am trying to create df3 to look like:
=======================================
column1  Column2  Column3
-------  -------  -------
1        12       Jimmy    
5        12       male 
33       12       25
.
.
1        16       Becky
5        16       female
33       16       30
.
.
1        75       Mike
5        75       male
33       75       80
.
.
.

The dataframes can be quite large. I am trying to figure out the most efficient way to do this without double looping if possible. please advise best way to do this. Thank you

1 Answer 1

1

stack and a merge gets you most of the way there:

In [11]: df2.set_index("id2").stack().reset_index(name='value')
Out[11]:
   id2           level_1   value
0   12  df2_column1_name   Jimmy
1   12  df2_column2_name    male
2   12  df2_column3_name      25
3   16  df2_column1_name   Becky
4   16  df2_column2_name  female
5   16  df2_column3_name      30
6   75  df2_column1_name    Mike
7   75  df2_column2_name    male
8   75  df2_column3_name      80

In [12]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")
Out[12]:
   id2           level_1   value  id1              name
0   12  df2_column1_name   Jimmy    1  df2_column1_name
1   16  df2_column1_name   Becky    1  df2_column1_name
2   75  df2_column1_name    Mike    1  df2_column1_name
3   12  df2_column2_name    male    5  df2_column2_name
4   16  df2_column2_name  female    5  df2_column2_name
5   75  df2_column2_name    male    5  df2_column2_name
6   12  df2_column3_name      25   33  df2_column3_name
7   16  df2_column3_name      30   33  df2_column3_name
8   75  df2_column3_name      80   33  df2_column3_name

Lastly you have to select only the columns you want and sort:

In [13]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")[["id1", "id2", "value"]].sort_v
     ...: alues("id2")
Out[13]:
   id1  id2   value
0    1   12   Jimmy
3    5   12    male
6   33   12      25
1    1   16   Becky
4    5   16  female
7   33   16      30
2    1   75    Mike
5    5   75    male
8   33   75      80
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. With minor changes, I was able to get it to work for my need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.