How to have multi index on both rows and columns of a dataframe without using tuples?

Question

Is there a way to create a dataframe having multi-indexing on both rows and columns without using tuples? My labels are too long to enter manually as tuples (96 countries and 26 sectors per country). Example of what I want

I tried:

df_data.columns=label_df 

df_data_w = pd.concat([label_df, data],axis=1,ignore_index=False)

This added the label df to the first two columns, but didn't index it. I instead get this following dataframe

Here is some code to use:

import numpy as np
import pandas as pd

a = np.random.randint(low=0, high=10,size=9)
b = np.random.randint(low=0, high=10,size=9)
c = np.random.randint(low=0, high=10,size=9)
d = np.random.randint(low=0, high=10,size=9)
e = np.random.randint(low=0, high=10,size=9)
f = np.random.randint(low=0, high=10,size=9)
g = np.random.randint(low=0, high=10,size=9)
h = np.random.randint(low=0, high=10,size=9)
i = np.random.randint(low=0, high=10,size=9)

df = pd.DataFrame(data=[a,b,c,d,e,f,g,h,i])

Continent = ['Africa','Africa','Africa','North America', 'North America', 'North America', 'Europe','Europe','Europe']

Sectors = ['Agriculture','Industry','Domestic','Agriculture','Industry','Domestic','Agriculture','Industry','Domestic']

label_df = pd.DataFrame(data=[Continent, Sectors])

df.columns=label_df  

df_w_labels = pd.concat([label_df, data],axis=1,ignore_index=False)`

This gives me the labels as headers in my df, but I need them as columns as well, so I tried concat, which added the label df to the first two columns, but didn't index it.

Welcome to SO. Please provide a minimal reproducible example. That means no links, no images, just text in your question. Good luck! — jpp
– jpp, Commented Mar 30, 2018 at 16:04
Thanks @jpp - my first SO post. Have edited to hopefully be more helpful. — rochellemarch
– rochellemarch, Commented Mar 30, 2018 at 18:32
To clarify, while you have a lot of labels you have only two levels, correct? "Country" and "sector"? — Ajean
– Ajean, Commented Mar 30, 2018 at 19:36

Scott Boston · Accepted Answer · 2018-03-30 20:44:32Z

0

You can use zip and list with pd.MultiIndex:

a = np.random.randint(low=0, high=10,size=9)
b = np.random.randint(low=0, high=10,size=9)
c = np.random.randint(low=0, high=10,size=9)
d = np.random.randint(low=0, high=10,size=9)
e = np.random.randint(low=0, high=10,size=9)
f = np.random.randint(low=0, high=10,size=9)
g = np.random.randint(low=0, high=10,size=9)
h = np.random.randint(low=0, high=10,size=9)
i = np.random.randint(low=0, high=10,size=9)

df = pd.DataFrame(data=[a,b,c,d,e,f,g,h,i])

Continent = ['Africa','Africa','Africa','North America', 'North America', 'North America', 'Europe','Europe','Europe']
Sectors = ['Agriculture','Industry','Domestic','Agriculture','Industry','Domestic','Agriculture','Industry','Domestic']

indx = pd.MultiIndex.from_tuples(list(zip(Continent,Sectors)))

df.index = indx
df.columns = indx

print(df)

answered Mar 30, 2018 at 20:44

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rochellemarch Over a year ago

Thanks Scott Boston, that’s helpful. I think my main issue is turning the columns in my .csv file into tuples, as the labels in my real data set are too long to enter manually (96 countries with 26 sectors each). Am going to try the xlrd package described here and report on results: stackoverflow.com/questions/37403460/…

Collectives™ on Stack Overflow

How to have multi index on both rows and columns of a dataframe without using tuples?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related