Multiindex data.frame from two data.frames join by column headers

Question

There are a dozens similar sounding questions here, I think I've searched them all and could not find a solution to my problem:

I have 2 df: df_c:

        CAN-01  CAN-02  CAN-03
CE                            
ce1     0.84     0.73     0.50
ce2     0.06     0.13     0.05

And df_z:

        CAN-01  CAN-02  CAN-03
marker                        
cell1     0.29     1.5       7
cell2     1.00     3.0       1

I want to join for each 'marker' + 'CE' combination over their column names

Example: cell1 + ce1:

[[0.29, 0.84],[1.5,0.73],[7,0.5], ...]

(Continuing for cell1 + ce2, cell2 + ce1, cell2 + ce2)

I have a working example using two loops and .loc twice, but it takes forever on the full data set.

I think the best to build is a multiindex DF with some merge/join/concat magic:

                CAN-01  CAN-02  CAN-03
     Source                            
0    CE           0.84    0.73    0.50
     Marker       0.29    1.5     7
1    CE           ...     
     Marker       ...

Sample Code

dc = [['ce1', 0.84, 0.73, 0.5],['c2', 0.06,0.13,0.05]] 
dat_c = pd.DataFrame(dc, columns=['CE', 'CAN-01', 'CAN-02', 'CAN-03']) 
dat_c.set_index('CE',inplace=True)

dz = [['cell1', 0.29, 1.5, 7],['cell2', 1, 3, 1]]
dat_z = pd.DataFrame(dz, columns=['marker', "CAN-01", "CAN-02", "CAN-03"])
dat_z.set_index('marker',inplace=True)

Bad/Slow Solution

for ci, c_row in dat_c.iterrows(): # for each CE in CE table

    tmp = []

    for j,colz in enumerate(dat_z.columns[1:]):

        if not colz in dat_c:
            continue
        entry_c = c_row.loc[colz]
    
        if len(entry_c.shape) > 0:
            continue
        
        tmp.append([dat_z.loc[marker,colz],entry_c])

Anurag Dabas · Accepted Answer · 2021-08-12 12:36:35Z

1

IIUC:

use append()+groupby():

dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
df=dat_c.append(dat_z).groupby(level=0).agg(list)

output of df:

         CAN-01         CAN-02          CAN-03
cell1   [0.84, 0.29]    [0.73, 1.5]     [0.5, 7.0]
cell2   [0.06, 1.0]     [0.13, 3.0]     [0.05, 1.0]

If needed list:

dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
lst=dat_c.append(dat_z).groupby(level=0).agg(list).to_numpy().tolist()

output of lst:

[[[0.84, 0.29], [0.73, 1.5], [0.5, 7.0]],
 [[0.06, 1.0], [0.13, 3.0], [0.05, 1.0]]]

edited Aug 12, 2021 at 12:36

answered Aug 12, 2021 at 12:25

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

mnzl Over a year ago

Then I still have to do another lookup to combine cell1 and ce2 e.g.: [0.29, 0.84]

Anurag Dabas Over a year ago

@mnzl can you pls post your expected output?

mnzl Over a year ago

Final output: [[0.29, 0.84],[1.5,0.73],[7,0.5], ...]

Anurag Dabas Over a year ago

@mnzl updated answer...kindly have a look :)

Anurag Dabas Over a year ago

@mnzl Since sir you unaccepted answer...are you facing any problem sir?

|

Collectives™ on Stack Overflow

Multiindex data.frame from two data.frames join by column headers

Sample Code

Bad/Slow Solution

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Sample Code

Bad/Slow Solution

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related