112

I would like to add a column to the second level of a multiindex column dataframe.

In [151]: df
Out[151]: 
first        bar                 baz           
second       one       two       one       two 
A       0.487880 -0.487661 -1.030176  0.100813 
B       0.267913  1.918923  0.132791  0.178503
C       1.550526 -0.312235 -1.177689 -0.081596 

The usual trick of direct assignment does not work:

In [152]: df['bar']['three'] = [0, 1, 2]

In [153]: df
Out[153]: 
first        bar                 baz           
second       one       two       one       two 
A       0.487880 -0.487661 -1.030176  0.100813
B       0.267913  1.918923  0.132791  0.178503
C       1.550526 -0.312235 -1.177689 -0.081596

How can I add the third row to under "bar"?

1
  • 7
    I guess the OP means to add the third column. Commented Aug 12, 2019 at 13:43

4 Answers 4

127

It's actually pretty simple (FWIW, I originally thought to do it your way):

df['bar', 'three'] = [0, 1, 2]
df = df.sort_index(axis=1)
print(df)

        bar                        baz          
        one       two  three       one       two
A -0.212901  0.503615      0 -1.660945  0.446778
B -0.803926 -0.417570      1 -0.336827  0.989343
C  3.400885 -0.214245      2  0.895745  1.011671
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks. I must say it is totally not obvious (to me) why the new column shows up only after using sort_index.
Oh sorry that's not part of the answer, just me being picky. It will actually show up as soon as you call df['bar', 'three'] = [0, 1, 2]. By default pandas will put it at the end of the DataFrame (after [baz, two]). I just wanted to see it with the other bars.
This appends the new column 'three' to the subtable 'bar'. But what if you want to insert (instead of append) this new column in subtable 'bar', e.g. insert 'three' in between 'one' and 'two'?
The order of columns doesn't really matter here. If you wanted to reorder them so they displayed "one, three, two" you could do that by using df.loc[:, XX] where XX has tuples ("bar", "one"), ("bar", "three"), etc.
Is it possible to generalize this to adding a third column to every sub-index? (i.e. in this case to have the three column added both for bar and for baz?
|
28

If we want to add a multi-level column:

Source DF:

In [221]: df
Out[221]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

Option 1: adding result of division: bar / baz as a new foo column

In [222]: df = df.join(
     ...:     df[['bar']].div(df['baz']).rename(columns={'bar':'foo'}))

In [223]: df
Out[223]:
first        bar                 baz                 foo
second       one       two       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740 -2.317647  1.424980
B       0.488875  0.428836  1.413451 -0.683677  0.345873 -0.627250
C      -0.243064 -0.069446 -0.911166  0.478370  0.266761 -0.145172

Option 2: adding multi-level column with three "sub-columns":

In [235]: df = df.join(pd.DataFrame(
     ...:     np.random.rand(3,3),
     ...:     columns=pd.MultiIndex.from_product([['new'], ['one','two','three']]),
     ...:     index=df.index))

In [236]: df
Out[236]:
first        bar                 baz                 new
second       one       two       one       two       one       two     three
A      -1.089798  2.053026  0.470218  1.440740  0.274291  0.636257  0.091048
B       0.488875  0.428836  1.413451 -0.683677  0.668157  0.456931  0.227568
C      -0.243064 -0.069446 -0.911166  0.478370  0.333824  0.363060  0.949672

1 Comment

And how to appen and independiente column? I tried with: df = df.join(pd.DataFrame(np.random.rand(3, 1), columns = pd.MultiIndex.from_product([['new']]), index = df.index)) Is the right way?
5

If you want to add multiple columns to a multiindex column dataframe, you can try

  1. All same value for columns
df[[("foo", "bar1"), ("foo", "bar2")]] = 2
        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    2    2
1  0.267913  1.918923  0.132791  0.178503    2    2
2  1.550526 -0.312235 -1.177689 -0.081596    2    2
  1. Same value for each column
df[[("foo", "bar1"), ("foo", "bar2")]] = [2, 3]
        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    2    3
1  0.267913  1.918923  0.132791  0.178503    2    3
2  1.550526 -0.312235 -1.177689 -0.081596    2    3
  1. Same value for each row
df[[("foo", "bar1"), ("foo", "bar2")]] = np.array([range(len(df)) for _ in range(2)]).T
        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    0    0
1  0.267913  1.918923  0.132791  0.178503    1    1
2  1.550526 -0.312235 -1.177689 -0.081596    2    2
  1. Different value for each cell
df[[("foo", "bar1"), ("foo", "bar2")]] = [[1, 2],
                                          [3, 4],
                                          [5, 6]] # shape is (3, 2) where 3 is index length and 2 is new added column length
        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    1    2
1  0.267913  1.918923  0.132791  0.178503    3    4
2  1.550526 -0.312235 -1.177689 -0.081596    5    6

Another usecase is that we have a single index dataframe, and we want to concat it to the multi index dataframe

        bar                 baz
       one       two       one       two     concat to      bar1  bar2
0  0.487880 -0.487661 -1.030176  0.100813   <---------  0     1     2
1  0.267913  1.918923  0.132791  0.178503               1     3     4
2  1.550526 -0.312235 -1.177689 -0.081596               2     5     6
  1. Generate a list of tuples for columns
df[[("foo", col) for col in single_index_df.columns]] = single_index_df
        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    1    2
1  0.267913  1.918923  0.132791  0.178503    3    4
2  1.550526 -0.312235 -1.177689 -0.081596    5    6
  1. Create a new multi index columns dataframe from the single index dataframe as Option 2 of MaxU - stop genocide of UA
df = df.join(pd.DataFrame(single_index_df.values,
                          columns=pd.MultiIndex.from_product([['foo'], single_index_df.columns]),
                          index=single_index_df.index))
  1. Create a multi index dataframe from single index dataframe with pd.concat({'foo': single_index_df}, axis=1)
df = pd.concat([df, pd.concat({'foo': single_index_df}, axis=1)], axis=1)
# or
df = df.join(pd.concat({'foo': single_index_df}, axis=1))

3 Comments

These are nice solutions. I like the first one because it won't create duplicates if accidentally executed more than once (as can happen when using notebooks for example). The last one does. Not sure about 2.
Actually, I just implemented method 1 on a large data frame and got the following repeated warning: "PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()"
@Bill Sorry for the late reply, I remember I have noticed your comment, but busy with my work and forgot to reply. I'm not really sure which method you refer when saying method 1, I have two method 1 in different usecases. I'm not really sure what fragment is, but from the warning message, I think for both usecases, the last method which begins with 'Create a multi index dataframe from single index dataframe' is possible a right way to solve the warning.
1

If you want to insert (instead of append at the end of the DF) do this:

df.insert(0, ('bar', 'three'), [0, 1, 2])

The second item has to be hashable, so a list will not work.

1 Comment

df.insert(loc=df.columns.get_loc(('bar', 'two'))+1, column=('bar', 'three'), value=[0, 1, 2])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.