Creating a multiindexed dataframe from other dataframe values with different indicies

Question

I have a dataframe (dfA) composed from several files. dfA has three things, a date value which matches other date values in dfA, a name from a list of an unknown number of names that will be the same as other names in the dfA, and a concentration value which is unique. I want to create a new dataframe (dfB), where the rows are multiindexed by date, the columns are the names, and the values are the concentration from said name value. I've attempted to do this using the code below:

for name in nameList:
    dfB[str(name)] = dfA[dfA['Sample Name'] == str(name)]['Calculated Concentration']

However, I am returned with a dataframe where only the first row is filled, which I presume is due to the index values being different than those for the other column values:

              '5/0.5 uM'  '10/1 uM'  '15/1.5 uM'
 083021 14     4.7886      NaN        NaN
        15     4.5374      NaN        NaN 
        ... 
 090721 14     5.2840      NaN        NaN
        15     5.3050      NaN        NaN
        ...
 083121 57     5.2132      NaN        NaN
        58     4.8929      NaN        NaN
        ...

In the above output, the leftmost column is the multiindexed column of the values by date. The next column contains the indexes of the original dfA. Then '5/0.5 uM' is a value from 'name list' which contains values found from the original dfA with the 'Sample Name' '5/0.5 uM'.

How can I create dfB where the other columns have the correct values, or change the index values from dfA where they will match for each column?

@butterflyknife Those are the index values of the values found from dfA that have the name ' 5/0.5 uM' — pstumps
– pstumps, Commented Oct 28, 2021 at 16:58

butterflyknife · Accepted Answer · 2021-10-28 21:44:26Z

You may find it easier down the line to reformat your dates to be datetime variables, rather than strings. With that said, set up test:

dfA = pd.DataFrame({
    "date":["090721","083021","090721","083021","083121","083021","083121","083021","083121","083121"], 
    "name":["15/1.5 uM","15/1.5 uM","15/1.5 uM","15/1.5 uM","5/0.5 uM","5/0.5 uM","15/1.5 uM","5/0.5 uM","10/1 uM","10/1 uM"],
    "concentration":[0.430896003263987,0.354506173988912,0.679630142573769,0.816498937365697,0.134105578748551,0.364977945903234,0.602557291467607,0.633223692609049,2.59697726598929E-02,0.206455687221982]
})
dfA["index"] = dfA.index # copy index over to a column called index, for reasons that will be clearer soon.
dfA

Note that there isn't necessarily a row for every combination of (date, name). To get what you've described:

pd.pivot_table(dfA, values="concentration", index=["date", "index"], columns=["name"],aggfunc=np.mean)

Output:

You can see there are NaNs. These correspond to the "gaps" in the starting dataframe as described earlier. Note that we used an aggregation function, np.mean, but that doesn't make any difference, because we have chosen our re-indexing such that there will only be one value of concentration to "aggregate" in each row.

You didn't ask for this, but you might also find it useful to take the mean of rows which have the same date and name? Simply drop the "index" from the multi-index specification:

pd.pivot_table(dfA, values="concentration", index=["date"], columns=["name"],aggfunc=np.mean)

Output:

Collectives™ on Stack Overflow

Creating a multiindexed dataframe from other dataframe values with different indicies

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related