There are a dozens similar sounding questions here, I think I've searched them all and could not find a solution to my problem:
I have 2 df: df_c:
CAN-01 CAN-02 CAN-03
CE
ce1 0.84 0.73 0.50
ce2 0.06 0.13 0.05
And df_z:
CAN-01 CAN-02 CAN-03
marker
cell1 0.29 1.5 7
cell2 1.00 3.0 1
I want to join for each 'marker' + 'CE' combination over their column names
Example: cell1 + ce1:
[[0.29, 0.84],[1.5,0.73],[7,0.5], ...]
(Continuing for cell1 + ce2, cell2 + ce1, cell2 + ce2)
I have a working example using two loops and .loc twice, but it takes forever on the full data set.
I think the best to build is a multiindex DF with some merge/join/concat magic:
CAN-01 CAN-02 CAN-03
Source
0 CE 0.84 0.73 0.50
Marker 0.29 1.5 7
1 CE ...
Marker ...
Sample Code
dc = [['ce1', 0.84, 0.73, 0.5],['c2', 0.06,0.13,0.05]]
dat_c = pd.DataFrame(dc, columns=['CE', 'CAN-01', 'CAN-02', 'CAN-03'])
dat_c.set_index('CE',inplace=True)
dz = [['cell1', 0.29, 1.5, 7],['cell2', 1, 3, 1]]
dat_z = pd.DataFrame(dz, columns=['marker', "CAN-01", "CAN-02", "CAN-03"])
dat_z.set_index('marker',inplace=True)
Bad/Slow Solution
for ci, c_row in dat_c.iterrows(): # for each CE in CE table
tmp = []
for j,colz in enumerate(dat_z.columns[1:]):
if not colz in dat_c:
continue
entry_c = c_row.loc[colz]
if len(entry_c.shape) > 0:
continue
tmp.append([dat_z.loc[marker,colz],entry_c])