Pandas Python create column by indexed assignment on frame with multiindex

Question

I can not figure out how to simply/efficiently add a column to a dataframe that has a multiindex by assignment from another single level index

I can add a column to a dataframe that has a single level index as follows:

df = pd.DataFrame({'data':[1, 2, 5]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Echo'], name='item1_name'))
df

df_item1 = pd.DataFrame({'id':[101, 102, 103, 104, 105]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo'], name='item1_name'))
df_item1

df['item1_id']=df_item1['id']
df

What I cant figure out is how to do this on a single level of a multiindex dataframe. e.g.

df_multi = pd.DataFrame({'data':[1, 2, 5, 11, 12, 15]}, 
                  index=pd.Index([('Alpha', 'X'), ('Alpha', 'Y'), ('Bravo', 'X'), 
                                  ('Bravo', 'Y'), ('Echo', 'X'), ('Echo', 'Y')], 
                                   name=('item1_name','item2_name')))

df_multi['item1_id']=df_item1['id']

df_multi

I just get NaNs as the indexes arent aligning. My big picture problem is that I am receiving data with a string name and i need to be able to replace with an integer id for both levels, item1_name and item2_name

I have long solutions using unstack/stack/reindex etc but it all seems a very long way around and I feel i ought to be able to join on the index If my second look up frame is this:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='item2_name'))

df_item2

what I want to end up with is

jezrael · Accepted Answer · 2021-03-18 11:55:12Z

1

Use DataFrame.join with rename here, it working by match index names with MultiIndex names:

df = (df_multi.join(df_item1.rename(columns={'id':'item1_name'}))
              .join(df_item2.rename(columns={'id':'item1_nam2'})))
print (df)
                       data  item1_name  item1_nam2
item1_name item2_name                              
Alpha      X              1         101         201
           Y              2         101         202
Bravo      X              5         102         201
           Y             11         102         202
Echo       X             12         105         201
           Y             15         105         202

If names of index not match, get error:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='aaa')) <- changed name


df = (df_multi.join(df_item2.rename(columns={'id':'item1_name'})))
print (df)

ValueError: cannot join with no overlapping index names

edited Mar 18, 2021 at 11:55

answered Mar 18, 2021 at 11:51

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JohnnieL Over a year ago

Thank you - i had in fact tried join but it was blowing up so I guess driver error at this end - thank you!

JohnnieL Over a year ago

1 supplementary: the join column is replicated when I use join: the output that you've posted here is that not showing the duplicated join columns for clarity or is there some pandas magic i'm missing here? Looking at the pandas join docs would suggest that duplication of join column is expected behaviour: Is there a smarter way to suppress that other than giving it a suffix during the join then dropping?

jezrael Over a year ago

@JohnnieL - I think not, unfortuantely. suffix is added for avoid duplicated columns names, e.g. here if not use rename then without suffix pandas should created id and id columns, what is problem (because if select df['id'] it return both columns)

Collectives™ on Stack Overflow

Pandas Python create column by indexed assignment on frame with multiindex

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related