2

df have:

    A   B  C
a   1   2  3
b   2   1  4
c   1   1  1

df want:

    A   B  C
a   1   2  3
b   2   1  4
c   1   1  1
d   1  -1  1

I am able to get df want by using:

df.loc['d']=df.loc['b']-df.loc['a']

However, my actual df has 'a','b','c' rows for multiple IDs 'X', 'Y' etc.

        A   B  C
  X a   1   2  3
    b   2   1  4
    c   1   1  1
  Y a   1   2  3
    b   2   1  4
    c   1   1  1

How can I create the same output with multiple IDs? My original method:

df.loc['d']=df.loc['b']-df.loc['a']

fails KeyError:'b'

Desired output:

        A   B  C
  X a   1   2  3
    b   2   1  4
    c   1   1  1
    d   1  -1  1
  Y a   1   2  3
    b   2   2  4
    c   1   1  1
    d   1   0  1
1
  • It's a multi-index. You need to provide both the higher level and the lower level of the index to resolve the row Commented Aug 9, 2019 at 17:17

3 Answers 3

1

IIUC,

for i, sub in df.groupby(df.index.get_level_values(0)):
  df.loc[(i, 'd'), :] = sub.loc[(i,'b')] - sub.loc[(i, 'a')]

print(df.sort_index())

Or maybe

k = df.groupby(df.index.get_level_values(0), as_index=False).apply(lambda s: pd.DataFrame([s.loc[(s.name,'b')].values - s.loc[(s.name, 'a')].values], 
                                                                                      columns=s.columns, 
                                                                                      index=pd.MultiIndex(levels=[[s.name], ['d']], codes=[[0],[0]])
                                                                                      )).reset_index(drop=True, level=0)

pd.concat([k, df]).sort_index()
Sign up to request clarification or add additional context in comments.

Comments

1

Data reshaping is a useful trick if you want to do manipulation on a particular level of a multiindex. See code below,

result = (df.unstack(0).T
            .assign(d=lambda x:x.b-x.a)
            .stack()
            .unstack(0))

Comments

0

Use pd.IndexSlice to slice a and b. Call diff and slice on b and rename it to d. Finally, append it to original df

idx = pd.IndexSlice
df1 = df.loc[idx[:,['a','b']],:].diff().loc[idx[:,'b'],:].rename({'b': 'd'})
df2 = df.append(df1).sort_index().astype(int)

Out[106]:
     A  B  C
X a  1  2  3
  b  2  1  4
  c  1  1  1
  d  1 -1  1
Y a  1  2  3
  b  2  2  4
  c  1  1  1
  d  1  0  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.