How to create new rows in multiindex DataFrames using existing data?

Question

I have a multi-indexed dataframe like so

                          Value
Source       Partner              
USA          DEU          20
             CHN          10
             MEX          5
DEU          USA          12
             CHN          6
             MEX          2
CHN          USA          1
             DEU          2
             MEX          3

I want to add rows to represent an aggregate of some core countries, say USA, DEU and CHN, by 'Source' as well as 'Partner'. The intended output of new rows is the following:

                          Value
Source       Partner              
CORE         USA          13
             DEU          22
             CHN          16
USA          CORE         30
CHN          CORE         3
DEU          CORE         18

Source and Partner are the two multi-indices.

Any tidy and quick way to generate the second output dataframe? In the actual application I have many more countries of course.

Hi, from what source are you adding the new rows? Another multi-index dataframe? If not, how do you "generate" them? — Laurent
– Laurent, Commented Jun 11, 2022 at 18:03
@Laurent I manually create a list of countries that I would call 'CORE', i.e. I could input a list ['USA','DEU','CHN']. Then I use the original df to create the new rows, e.g. for source 'CORE' and partner 'USA', I sum all sources in CORE group that has partner 'USA'. From above, example this is 12+1 = 13. — nutix
– nutix, Commented Jul 5, 2022 at 8:28

Laurent · Accepted Answer · 2022-07-05 10:07:50Z

With your initial dataframe:

import pandas as pd

data = {
    ("USA", "DEU"): 20,
    ("USA", "CHN"): 10,
    ("USA", "MEX"): 5,
    ("DEU", "USA"): 12,
    ("DEU", "CHN"): 6,
    ("DEU", "MEX"): 2,
    ("CHN", "USA"): 1,
    ("CHN", "DEU"): 2,
    ("CHN", "MEX"): 3,
}

df = pd.DataFrame(list(data.values()), index=data.keys(), columns=["Value"])
df.index.names = ["Source", "Partner"]

Here is one way to do it:

CORE = ["USA", "DEU", "CHN"]

# Build first part of the expected dataframe
df1 = df.reset_index()
df1["Source"] = "CORE"
df1 = (
    df1.loc[df1["Partner"].isin(CORE), :]
    .groupby(["Source", "Partner"])
    .agg(sum)
    .sort_values("Partner", ascending=False)
)

print(df1)
# Output

# Build second part
df2 = df.reset_index()
df2 = (
    df2.loc[df2["Partner"].isin(CORE), :]
    .assign(Partner="CORE")
    .groupby(["Source", "Partner"])
    .agg(sum)
    .reindex(
        index=pd.MultiIndex.from_tuples(
            [("USA", "CORE"), ("CHN", "CORE"), ("DEU", "CORE")],
            names=["Source", "Partner"],
        )
    )
)

print(df2)
# Output

And then:

# Build final dataframe
new_df = pd.concat([df1, df2])

print(new_df)
# Output

Collectives™ on Stack Overflow

How to create new rows in multiindex DataFrames using existing data?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related