0

I can not figure out how to simply/efficiently add a column to a dataframe that has a multiindex by assignment from another single level index

I can add a column to a dataframe that has a single level index as follows:

df = pd.DataFrame({'data':[1, 2, 5]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Echo'], name='item1_name'))
df

enter image description here

df_item1 = pd.DataFrame({'id':[101, 102, 103, 104, 105]}, 
                  index=pd.Index(['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo'], name='item1_name'))
df_item1

enter image description here

df['item1_id']=df_item1['id']
df

enter image description here

What I cant figure out is how to do this on a single level of a multiindex dataframe. e.g.

df_multi = pd.DataFrame({'data':[1, 2, 5, 11, 12, 15]}, 
                  index=pd.Index([('Alpha', 'X'), ('Alpha', 'Y'), ('Bravo', 'X'), 
                                  ('Bravo', 'Y'), ('Echo', 'X'), ('Echo', 'Y')], 
                                   name=('item1_name','item2_name')))

df_multi['item1_id']=df_item1['id']

df_multi

enter image description here

I just get NaNs as the indexes arent aligning. My big picture problem is that I am receiving data with a string name and i need to be able to replace with an integer id for both levels, item1_name and item2_name

I have long solutions using unstack/stack/reindex etc but it all seems a very long way around and I feel i ought to be able to join on the index If my second look up frame is this:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='item2_name'))

df_item2

enter image description here

what I want to end up with is

enter image description here

1 Answer 1

1

Use DataFrame.join with rename here, it working by match index names with MultiIndex names:

df = (df_multi.join(df_item1.rename(columns={'id':'item1_name'}))
              .join(df_item2.rename(columns={'id':'item1_nam2'})))
print (df)
                       data  item1_name  item1_nam2
item1_name item2_name                              
Alpha      X              1         101         201
           Y              2         101         202
Bravo      X              5         102         201
           Y             11         102         202
Echo       X             12         105         201
           Y             15         105         202

If names of index not match, get error:

df_item2 = pd.DataFrame({'id':[201, 202]}, 
                  index=pd.Index(['X', 'Y'], name='aaa')) <- changed name


df = (df_multi.join(df_item2.rename(columns={'id':'item1_name'})))
print (df)

ValueError: cannot join with no overlapping index names

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you - i had in fact tried join but it was blowing up so I guess driver error at this end - thank you!
1 supplementary: the join column is replicated when I use join: the output that you've posted here is that not showing the duplicated join columns for clarity or is there some pandas magic i'm missing here? Looking at the pandas join docs would suggest that duplication of join column is expected behaviour: Is there a smarter way to suppress that other than giving it a suffix during the join then dropping?
@JohnnieL - I think not, unfortuantely. suffix is added for avoid duplicated columns names, e.g. here if not use rename then without suffix pandas should created id and id columns, what is problem (because if select df['id'] it return both columns)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.