2

I am sorry for asking but I did not get the still existing answers. I simply glued two data frames with the same column names.

|    |   X |   Y |   X |   Y |
|---:|----:|----:|----:|----:|
|  0 |   1 |   3 |   9 |   7 |
|  1 |   2 |   4 |   8 |   6 |

What I want is

|    |    FOO    |    BAR    |
|    |   X |   Y |   X |   Y |
|---:|----:|----:|----:|----:|
|  0 |   1 |   3 |   9 |   7 |
|  1 |   2 |   4 |   8 |   6 |

I tried pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']]) but this results in

MultiIndex([('X', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'FOO'),
            ('Y', 'BAR'),
            ('X', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'FOO'),
            ('Y', 'BAR')],
           )

But I need

MultiIndex([('X', 'FOO'),
            ('Y', 'FOO'),
            ('X', 'BAR'),
            ('Y', 'BAR')],
           )

This is an MWE

#!/usr/bin/env python3
import pandas as pd

a = pd.DataFrame({'X': [1,2], 'Y': [3, 4]})
b = pd.DataFrame({'X': [9,8], 'Y': [7, 6]})

c = pd.concat([a, b], axis=1)

# throws a ValueError: Length mismatch: Expected axis has 4 elements, new values have 8 elements
c.columns = pd.MultiIndex.from_product([c.columns, ['FOO', 'BAR']])

Would it help to do something to the two separate DataFrames before I concat() them?

2
  • 3
    pd.MultiIndex.from_product([df.columns.unique(), ['FOO', 'BAR']])? Commented Oct 15, 2021 at 14:27
  • c.columns = pd.MultiIndex.from_arrays((pd.Index(['FOO', 'BAR']).repeat(2), c.columns)) Commented Oct 15, 2021 at 15:44

3 Answers 3

3

You could simply add the extra level using numpy.repeat and pandas.MultiIndex:

import numpy as np

extra = ['FOO', 'BAR']
c.columns = pd.MultiIndex.from_arrays([np.repeat(extra, len(c.columns)//len(extra)),
                                       c.columns])

output:

  FOO    BAR   
    X  Y   X  Y
0   1  3   9  7
1   2  4   8  6

NB. If the columns are shuffled, sort the columns and use np.tile instead:

c = c.sort_index(axis=1)

extra = ['FOO', 'BAR']
c.columns = pd.MultiIndex.from_arrays([np.tile(extra, len(c.columns)//len(extra)),
                                       c.columns])

output:

  FOO BAR FOO BAR
    X   X   Y   Y
0   1   9   3   7
1   2   8   4   6
Sign up to request clarification or add additional context in comments.

Comments

3

Use the keys argument of the pd.concat function:

pd.concat([a, b], axis = 1, keys = ["FOO", "BAR"])

Comments

0

If you are working with numpy array

import pandas as pd
import numpy as np
_names=['FOO','BAR','FOO','BAR']

_idx=['X','Y','X','Y']

X=np.random.rand(4,len(_names))
columns = pd.MultiIndex.from_arrays([_names, _idx])
df=pd.DataFrame(data=X, columns=columns)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.