Pandas: add a column to a multiindex column dataframe

Question

I would like to add a column to the second level of a multiindex column dataframe.

In [151]: df
Out[151]: 
first        bar                 baz           
second       one       two       one       two 
A       0.487880 -0.487661 -1.030176  0.100813 
B       0.267913  1.918923  0.132791  0.178503
C       1.550526 -0.312235 -1.177689 -0.081596

The usual trick of direct assignment does not work:

In [152]: df['bar']['three'] = [0, 1, 2]

In [153]: df
Out[153]: 
first        bar                 baz           
second       one       two       one       two 
A       0.487880 -0.487661 -1.030176  0.100813
B       0.267913  1.918923  0.132791  0.178503
C       1.550526 -0.312235 -1.177689 -0.081596

How can I add the third row to under "bar"?

I guess the OP means to add the third column.

Qaswed
– Qaswed

2019-08-12 13:43:04 +00:00
Commented Aug 12, 2019 at 13:43 — Qaswed
– Qaswed, Commented Aug 12, 2019 at 13:43

spencerlyon2 · Accepted Answer · 2013-04-18 17:11:37Z

127

It's actually pretty simple (FWIW, I originally thought to do it your way):

df['bar', 'three'] = [0, 1, 2]
df = df.sort_index(axis=1)
print(df)

        bar                        baz          
        one       two  three       one       two
A -0.212901  0.503615      0 -1.660945  0.446778
B -0.803926 -0.417570      1 -0.336827  0.989343
C  3.400885 -0.214245      2  0.895745  1.011671

answered Apr 18, 2013 at 17:11

spencerlyon2

9,7364 gold badges33 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user1642513 Over a year ago

Thanks. I must say it is totally not obvious (to me) why the new column shows up only after using sort_index.

spencerlyon2 Over a year ago

Oh sorry that's not part of the answer, just me being picky. It will actually show up as soon as you call df['bar', 'three'] = [0, 1, 2]. By default pandas will put it at the end of the DataFrame (after [baz, two]). I just wanted to see it with the other bars.

Joris Kinable Over a year ago

This appends the new column 'three' to the subtable 'bar'. But what if you want to insert (instead of append) this new column in subtable 'bar', e.g. insert 'three' in between 'one' and 'two'?

spencerlyon2 Over a year ago

The order of columns doesn't really matter here. If you wanted to reorder them so they displayed "one, three, two" you could do that by using df.loc[:, XX] where XX has tuples ("bar", "one"), ("bar", "three"), etc.

SO_tourist Over a year ago

Is it possible to generalize this to adding a third column to every sub-index? (i.e. in this case to have the three column added both for bar and for baz?

|

wjandrea · Accepted Answer · 2023-07-12 17:08:35Z

28

If we want to add a multi-level column:

Source DF:

In [221]: df
Out[221]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

Option 1: adding result of division: bar / baz as a new foo column

In [222]: df = df.join(
     ...:     df[['bar']].div(df['baz']).rename(columns={'bar':'foo'}))

In [223]: df
Out[223]:
first        bar                 baz                 foo
second       one       two       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740 -2.317647  1.424980
B       0.488875  0.428836  1.413451 -0.683677  0.345873 -0.627250
C      -0.243064 -0.069446 -0.911166  0.478370  0.266761 -0.145172

Option 2: adding multi-level column with three "sub-columns":

In [235]: df = df.join(pd.DataFrame(
     ...:     np.random.rand(3,3),
     ...:     columns=pd.MultiIndex.from_product([['new'], ['one','two','three']]),
     ...:     index=df.index))

In [236]: df
Out[236]:
first        bar                 baz                 new
second       one       two       one       two       one       two     three
A      -1.089798  2.053026  0.470218  1.440740  0.274291  0.636257  0.091048
B       0.488875  0.428836  1.413451 -0.683677  0.668157  0.456931  0.227568
C      -0.243064 -0.069446 -0.911166  0.478370  0.333824  0.363060  0.949672

edited Jul 12, 2023 at 17:08

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Jul 19, 2017 at 11:37

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

1 Comment

Chacho Fuva Over a year ago

And how to appen and independiente column? I tried with:

df = df.join(pd.DataFrame(np.random.rand(3, 1),                                  columns = pd.MultiIndex.from_product([['new']]),                                  index = df.index))

Is the right way?

Ynjxsjmh · Accepted Answer · 2024-07-13 11:04:14Z

5

If you want to add multiple columns to a multiindex column dataframe, you can try

All same value for columns

df[[("foo", "bar1"), ("foo", "bar2")]] = 2

        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    2    2
1  0.267913  1.918923  0.132791  0.178503    2    2
2  1.550526 -0.312235 -1.177689 -0.081596    2    2

Same value for each column

df[[("foo", "bar1"), ("foo", "bar2")]] = [2, 3]

        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    2    3
1  0.267913  1.918923  0.132791  0.178503    2    3
2  1.550526 -0.312235 -1.177689 -0.081596    2    3

Same value for each row

df[[("foo", "bar1"), ("foo", "bar2")]] = np.array([range(len(df)) for _ in range(2)]).T

        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    0    0
1  0.267913  1.918923  0.132791  0.178503    1    1
2  1.550526 -0.312235 -1.177689 -0.081596    2    2

Different value for each cell

df[[("foo", "bar1"), ("foo", "bar2")]] = [[1, 2],
                                          [3, 4],
                                          [5, 6]] # shape is (3, 2) where 3 is index length and 2 is new added column length

        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    1    2
1  0.267913  1.918923  0.132791  0.178503    3    4
2  1.550526 -0.312235 -1.177689 -0.081596    5    6

Another usecase is that we have a single index dataframe, and we want to concat it to the multi index dataframe

        bar                 baz
       one       two       one       two     concat to      bar1  bar2
0  0.487880 -0.487661 -1.030176  0.100813   <---------  0     1     2
1  0.267913  1.918923  0.132791  0.178503               1     3     4
2  1.550526 -0.312235 -1.177689 -0.081596               2     5     6

Generate a list of tuples for columns

df[[("foo", col) for col in single_index_df.columns]] = single_index_df

        bar                 baz            foo
        one       two       one       two bar1 bar2
0  0.487880 -0.487661 -1.030176  0.100813    1    2
1  0.267913  1.918923  0.132791  0.178503    3    4
2  1.550526 -0.312235 -1.177689 -0.081596    5    6

Create a new multi index columns dataframe from the single index dataframe as Option 2 of MaxU - stop genocide of UA

df = df.join(pd.DataFrame(single_index_df.values,
                          columns=pd.MultiIndex.from_product([['foo'], single_index_df.columns]),
                          index=single_index_df.index))

Create a multi index dataframe from single index dataframe with pd.concat({'foo': single_index_df}, axis=1)

df = pd.concat([df, pd.concat({'foo': single_index_df}, axis=1)], axis=1)
# or
df = df.join(pd.concat({'foo': single_index_df}, axis=1))

edited Jul 13, 2024 at 11:04

answered Aug 1, 2022 at 16:53

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

3 Comments

Bill Over a year ago

These are nice solutions. I like the first one because it won't create duplicates if accidentally executed more than once (as can happen when using notebooks for example). The last one does. Not sure about 2.

Bill Over a year ago

Actually, I just implemented method 1 on a large data frame and got the following repeated warning: "PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()"

Ynjxsjmh Over a year ago

@Bill Sorry for the late reply, I remember I have noticed your comment, but busy with my work and forgot to reply. I'm not really sure which method you refer when saying method 1, I have two method 1 in different usecases. I'm not really sure what fragment is, but from the warning message, I think for both usecases, the last method which begins with 'Create a multi index dataframe from single index dataframe' is possible a right way to solve the warning.

dopexxx · Accepted Answer · 2023-03-20 14:21:25Z

1

If you want to insert (instead of append at the end of the DF) do this:

df.insert(0, ('bar', 'three'), [0, 1, 2])

The second item has to be hashable, so a list will not work.

answered Mar 20, 2023 at 14:21

dopexxx

2,7061 gold badge24 silver badges30 bronze badges

1 Comment

warem Over a year ago

df.insert(loc=df.columns.get_loc(('bar', 'two'))+1, column=('bar', 'three'), value=[0, 1, 2])

Collectives™ on Stack Overflow

Pandas: add a column to a multiindex column dataframe

4 Answers 4

7 Comments

1 Comment

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

1 Comment

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related