5

I have following DataFrame:

df = pd.DataFrame([[1,2,3], [11,22,33]], columns = ['A', 'B', 'C'])
df.set_index(['A', 'B'], inplace=True)

        C
A  B     
1  2    3
11 22  33

How I make additional 'text' column that will be string combination of the MultiIndex.

Without removing my index!

For example:

        C    D
A  B            
1  2    3    1_2
11 22  33  11_22
0

4 Answers 4

3

Perhaps a simple list comprehension might help i.e

df['new'] = ['_'.join(map(str,i)) for i in df.index.tolist()]

        C    new
A  B            
1  2    3    1_2
11 22  33  11_22
Sign up to request clarification or add additional context in comments.

Comments

3

Use:

df['new'] = df.index.map('{0[0]}_{0[1]}'.format)

Output:

        C    new
A  B            
1  2    3    1_2
11 22  33  11_22

Comments

3

With so many elegant methods it is not clear which one to choose. So, here is a performance comparison of the methods provided in the other answers plus an alternative one for two cases: 1) the multi-index is comprised of integers; 2) the multi-index is comprised of strings.

Jezrael's method (f_3) wins in both cases. However, Dark's (f_2) is the slowest one for the second case. Method 1 performs very poorly with integers due to the type conversion step but is as fast as f_3 with strings.

Case 1:

df = pd.DataFrame({'A': randint(1, 10, num_rows), 'B': randint(10, 20, num_rows), 'C': randint(20, 30, num_rows)})
df.set_index(['A', 'B'], inplace=True)

# Method 1
def f_1(df): 
    df['D'] = df.index.get_level_values(0).astype('str') + '_' + df.index.get_level_values(1).astype('str')
    return df

## Method 2
def f_2(df):
    df['D'] = ['_'.join(map(str,i)) for i in df.index.tolist()]
    return df

## Method 3
def f_3(df): 
    df['D'] = [f'{i}_{j}' for i, j in df.index]
    return df

## Method 4
def f_4(df): 
    df['new'] = df.index.map('{0[0]}_{0[1]}'.format)
    return df

enter image description here

Case 2:

alpha = list("abcdefghijklmnopqrstuvwxyz")
df = pd.DataFrame({'A': np.random.choice(alpha, size=num_rows), \
                     'B': np.random.choice(alpha, size=num_rows), \
                     'C': randint(20, 30, num_rows)})
df.set_index(['A', 'B'], inplace=True)

# Method 1
def f_1(df): 
    df['D'] = df.index.get_level_values(0) + '_' + df.index.get_level_values(1)
    return df

enter image description here

3 Comments

If there are 5 to 6 levels of index then??
@KRKirov - And what about graphs like this ? People love them... :)?
@jezrael I have added a graphical comparison for data frames with sizes between 0.5 to 30M rows.
2

Solution in python 3.6:

df['new'] = [f'{i}_{j}' for i, j in df.index]
print (df)
        C    new
A  B            
1  2    3    1_2
11 22  33  11_22

And bellow:

df['new'] = ['{}_{}'.format(i,j) for i, j in df.index]

2 Comments

I thought the same but I doubt there will be only 2 levels. There might be more levels too na?
@Dark - I think generally it is no problem, because more as 4 levels MultiIndex are rarest used. And mainly 2 levels :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.