0

There are a dozens similar sounding questions here, I think I've searched them all and could not find a solution to my problem:

I have 2 df: df_c:

        CAN-01  CAN-02  CAN-03
CE                            
ce1     0.84     0.73     0.50
ce2     0.06     0.13     0.05

And df_z:

        CAN-01  CAN-02  CAN-03
marker                        
cell1     0.29     1.5       7
cell2     1.00     3.0       1

I want to join for each 'marker' + 'CE' combination over their column names

Example: cell1 + ce1:

[[0.29, 0.84],[1.5,0.73],[7,0.5], ...]

(Continuing for cell1 + ce2, cell2 + ce1, cell2 + ce2)

I have a working example using two loops and .loc twice, but it takes forever on the full data set.

I think the best to build is a multiindex DF with some merge/join/concat magic:

                CAN-01  CAN-02  CAN-03
     Source                            
0    CE           0.84    0.73    0.50
     Marker       0.29    1.5     7
1    CE           ...     
     Marker       ...     

Sample Code

dc = [['ce1', 0.84, 0.73, 0.5],['c2', 0.06,0.13,0.05]] 
dat_c = pd.DataFrame(dc, columns=['CE', 'CAN-01', 'CAN-02', 'CAN-03']) 
dat_c.set_index('CE',inplace=True)

dz = [['cell1', 0.29, 1.5, 7],['cell2', 1, 3, 1]]
dat_z = pd.DataFrame(dz, columns=['marker', "CAN-01", "CAN-02", "CAN-03"])
dat_z.set_index('marker',inplace=True)

Bad/Slow Solution

for ci, c_row in dat_c.iterrows(): # for each CE in CE table

    tmp = []

    for j,colz in enumerate(dat_z.columns[1:]):

        if not colz in dat_c:
            continue
        entry_c = c_row.loc[colz]
    
        if len(entry_c.shape) > 0:
            continue
        
        tmp.append([dat_z.loc[marker,colz],entry_c])

1 Answer 1

1

IIUC:

use append()+groupby():

dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
df=dat_c.append(dat_z).groupby(level=0).agg(list)

output of df:

         CAN-01         CAN-02          CAN-03
cell1   [0.84, 0.29]    [0.73, 1.5]     [0.5, 7.0]
cell2   [0.06, 1.0]     [0.13, 3.0]     [0.05, 1.0]

If needed list:

dat_c.index=[f"cell{x+1}" for x in range(len(dat_c))]
lst=dat_c.append(dat_z).groupby(level=0).agg(list).to_numpy().tolist()

output of lst:

[[[0.84, 0.29], [0.73, 1.5], [0.5, 7.0]],
 [[0.06, 1.0], [0.13, 3.0], [0.05, 1.0]]]
Sign up to request clarification or add additional context in comments.

8 Comments

Then I still have to do another lookup to combine cell1 and ce2 e.g.: [0.29, 0.84]
@mnzl can you pls post your expected output?
Final output: [[0.29, 0.84],[1.5,0.73],[7,0.5], ...]
@mnzl updated answer...kindly have a look :)
@mnzl Since sir you unaccepted answer...are you facing any problem sir?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.