1

I have a multi-indexed dataframe like so

                          Value
Source       Partner              
USA          DEU          20
             CHN          10
             MEX          5
DEU          USA          12
             CHN          6
             MEX          2
CHN          USA          1
             DEU          2
             MEX          3

I want to add rows to represent an aggregate of some core countries, say USA, DEU and CHN, by 'Source' as well as 'Partner'. The intended output of new rows is the following:

                          Value
Source       Partner              
CORE         USA          13
             DEU          22
             CHN          16
USA          CORE         30
CHN          CORE         3
DEU          CORE         18

Source and Partner are the two multi-indices.

Any tidy and quick way to generate the second output dataframe? In the actual application I have many more countries of course.

2
  • Hi, from what source are you adding the new rows? Another multi-index dataframe? If not, how do you "generate" them? Commented Jun 11, 2022 at 18:03
  • @Laurent I manually create a list of countries that I would call 'CORE', i.e. I could input a list ['USA','DEU','CHN']. Then I use the original df to create the new rows, e.g. for source 'CORE' and partner 'USA', I sum all sources in CORE group that has partner 'USA'. From above, example this is 12+1 = 13. Commented Jul 5, 2022 at 8:28

1 Answer 1

1

With your initial dataframe:

import pandas as pd

data = {
    ("USA", "DEU"): 20,
    ("USA", "CHN"): 10,
    ("USA", "MEX"): 5,
    ("DEU", "USA"): 12,
    ("DEU", "CHN"): 6,
    ("DEU", "MEX"): 2,
    ("CHN", "USA"): 1,
    ("CHN", "DEU"): 2,
    ("CHN", "MEX"): 3,
}

df = pd.DataFrame(list(data.values()), index=data.keys(), columns=["Value"])
df.index.names = ["Source", "Partner"]

Here is one way to do it:

CORE = ["USA", "DEU", "CHN"]

# Build first part of the expected dataframe
df1 = df.reset_index()
df1["Source"] = "CORE"
df1 = (
    df1.loc[df1["Partner"].isin(CORE), :]
    .groupby(["Source", "Partner"])
    .agg(sum)
    .sort_values("Partner", ascending=False)
)

print(df1)
# Output

enter image description here

# Build second part
df2 = df.reset_index()
df2 = (
    df2.loc[df2["Partner"].isin(CORE), :]
    .assign(Partner="CORE")
    .groupby(["Source", "Partner"])
    .agg(sum)
    .reindex(
        index=pd.MultiIndex.from_tuples(
            [("USA", "CORE"), ("CHN", "CORE"), ("DEU", "CORE")],
            names=["Source", "Partner"],
        )
    )
)

print(df2)
# Output

enter image description here

And then:

# Build final dataframe
new_df = pd.concat([df1, df2])
print(new_df)
# Output

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.