1

I'm new to pandas and working with indices, especially MultiIndex. I have a DataFrame as such:

df = pd.DataFrame({
    'ID':[1,2,1,2],
    'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
    'Result':[0.1,0.2,0.5,0.7],
    'ResultType':['A','B','C','B']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'ResultType'])

This creates two indices Result and Type, but I'd like to modify the values in the Result index based on the Type value.

For example, if Type == 'C' then I want the corresponding Result to be -1.

Another example would be if Type in ('A', 'B') then I want to set Result to be 0 if < 0.5 else 1

How can I programatically do this without looping through each row/column?

Output for piv.Result would look like:

Measurement     ScanA   ScanB
ID      
1               0.0     -1
2               0.0     1
1
  • What is your expected output dataframe? Commented Feb 26, 2019 at 20:38

3 Answers 3

3

You can first modify the dataframe and then pivot

df.loc[df['Type'] == 'C', 'Result'] = -1
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), 'Result'] = 0
df.loc[(df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5), 'Result'] = 1

df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])

            Result          Type
Measurement ScanA   ScanB   ScanA   ScanB
ID              
1           0       1       A       A
2           0       -1      B       C

Edit: An alternative using np.select as @Wen-Ben suggested

cond = [df['Type'] == 'C', (df['Type'].isin(['A', 'B'])) & (df['Result'] < 0.5), (df['Type'].isin(['A', 'B'])) & (df['Result'] >= 0.5)]
choice = [-1, 0, 1]
df['Result'] = np.select(cond, choice)

df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
Sign up to request clarification or add additional context in comments.

2 Comments

May check with np.select
I like the use np.select since these conditions may need to be adjusted with tuning over time.
3

As an alternative to what Vaishali suggested, you can use numpy.where to modify piv after it is generated.

t = piv['Type']  
r = piv['Result'].astype(float)

piv.loc[:, 'Result'] = np.where(
    t == 'A', np.where(r < 0.5, 0, 1), np.where(t == 'C', -1, r))
piv

            Result        Type      
Measurement  ScanA ScanB ScanA ScanB
ID                                  
1              0.0   1.0     A     A
2              0.2  -1.0     B     C

Comments

0

There is a method in the pandas library called loc() which allows the user to find the values of the row through calling its label. This function, along with the conditional selection offered by pandas, allows the user to find the row where certain conditions are true (similar to the WHERE functionality found in SQL) and set values for other columns in that row. Using these methods, I have added a portion to your code that sets the 'Result' column to -1 when 'Type' == 'A'. You can apply the same structure to alter other 'Result' rows based on 'Type' values.

df = pd.DataFrame({
    'ID':[1,2,1,2],
    'Measurement': ['ScanA', 'ScanA', 'ScanB', 'ScanB'],
    'Result':[0.1,0.2,0.5,0.7],
    'Type':['A','B','A','C']})
piv = df.pivot(index = 'ID', columns = 'Measurement', values = ['Result', 'Type'])
df.loc[df['Type'] == 'C', ['Result']] = -1
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.