1

I have a dataframe (dfA) composed from several files. dfA has three things, a date value which matches other date values in dfA, a name from a list of an unknown number of names that will be the same as other names in the dfA, and a concentration value which is unique. I want to create a new dataframe (dfB), where the rows are multiindexed by date, the columns are the names, and the values are the concentration from said name value. I've attempted to do this using the code below:

for name in nameList:
    dfB[str(name)] = dfA[dfA['Sample Name'] == str(name)]['Calculated Concentration']

However, I am returned with a dataframe where only the first row is filled, which I presume is due to the index values being different than those for the other column values:

              '5/0.5 uM'  '10/1 uM'  '15/1.5 uM'
 083021 14     4.7886      NaN        NaN
        15     4.5374      NaN        NaN 
        ... 
 090721 14     5.2840      NaN        NaN
        15     5.3050      NaN        NaN
        ...
 083121 57     5.2132      NaN        NaN
        58     4.8929      NaN        NaN
        ...

In the above output, the leftmost column is the multiindexed column of the values by date. The next column contains the indexes of the original dfA. Then '5/0.5 uM' is a value from 'name list' which contains values found from the original dfA with the 'Sample Name' '5/0.5 uM'.

How can I create dfB where the other columns have the correct values, or change the index values from dfA where they will match for each column?

2
  • 1
    What is the column in red that starts (14, 15, ...)? Commented Oct 27, 2021 at 23:58
  • @butterflyknife Those are the index values of the values found from dfA that have the name ' 5/0.5 uM' Commented Oct 28, 2021 at 16:58

1 Answer 1

1

You may find it easier down the line to reformat your dates to be datetime variables, rather than strings. With that said, set up test:

dfA = pd.DataFrame({
    "date":["090721","083021","090721","083021","083121","083021","083121","083021","083121","083121"], 
    "name":["15/1.5 uM","15/1.5 uM","15/1.5 uM","15/1.5 uM","5/0.5 uM","5/0.5 uM","15/1.5 uM","5/0.5 uM","10/1 uM","10/1 uM"],
    "concentration":[0.430896003263987,0.354506173988912,0.679630142573769,0.816498937365697,0.134105578748551,0.364977945903234,0.602557291467607,0.633223692609049,2.59697726598929E-02,0.206455687221982]
})
dfA["index"] = dfA.index # copy index over to a column called index, for reasons that will be clearer soon.
dfA

starting test dataframe

Note that there isn't necessarily a row for every combination of (date, name). To get what you've described:

pd.pivot_table(dfA, values="concentration", index=["date", "index"], columns=["name"],aggfunc=np.mean)

Output:

pivotted but not aggregated

You can see there are NaNs. These correspond to the "gaps" in the starting dataframe as described earlier. Note that we used an aggregation function, np.mean, but that doesn't make any difference, because we have chosen our re-indexing such that there will only be one value of concentration to "aggregate" in each row.

You didn't ask for this, but you might also find it useful to take the mean of rows which have the same date and name? Simply drop the "index" from the multi-index specification:

pd.pivot_table(dfA, values="concentration", index=["date"], columns=["name"],aggfunc=np.mean)

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.