0

Let's say that I want to create a multi index and multi column dataframe:

                          X         Y
Planet Continent Country  A    B    C     D 
Earth     Europe England  0.3  0.5  0.6   0.8
          Europe Italy    0.1  0.2  0.4   1.2 
Mars      Tempe  Sirtys   3.2  4.5  2.3   4.2 

I want to do that by iteratively collecting each single row of the dataframe,

row1 =  np.array(['Earth', 'Europe', 'England', 0.3, 0.5, 0.6, 0.8])
row2 =  np.array(['Earth', 'Europe', 'Italy', 0.1, 0.2, 0.4, 1.2])

I know how, starting with rows, I can create a multi-column dataframe, and I know how I can create a multi-index one. But how can I create both? Thanks

10
  • df.reset_index().to_numpy() ? Commented Apr 29, 2020 at 13:34
  • how do you start? do you already have the multiindex index and columns in an empty dataframe? Commented Apr 29, 2020 at 13:35
  • 2
    I think OP wants to go in the other direction. Commented Apr 29, 2020 at 13:35
  • I can start in any way to be honest. What's important is that at some point I have those rows and I need to build a dataframe out of it, using the first x element as indexes and the other as values in a multi-columns fashion. Also yes, I want to go from numpy to pandas :) Edit: Ben, if I understand you question, I have the column names and the multi indexes names, not all the possible indexes. Commented Apr 29, 2020 at 13:36
  • 1
    It also depends on how you want/need to create your dataframe. Do you need to update the rows one-by-one? Or do you have all the rows and want to create the dataframe at once? Commented Apr 29, 2020 at 13:41

2 Answers 2

4

if you start from an empty dataframe define with multiindex index and columns (as known according to you):

df = pd.DataFrame(index=pd.MultiIndex(levels=[[]]*3, 
                                      codes=[[]]*3, 
                                      names=['Planet','Continent','Country']), 
                 columns=pd.MultiIndex.from_tuples([('X','A'), ('X','B'),
                                                    ('Y','C'), ('Y', 'D')],))

Then you can just add each row like:

df.loc[tuple(row1[:3]), :]= row1[3:]
print (df)
                            X         Y     
                            A    B    C    D
Planet Continent Country                    
Earth  Europe    England  0.3  0.5  0.6  0.8

and again after:

df.loc[tuple(row2[:3]), :]= row2[3:]
print (df)
                            X         Y     
                            A    B    C    D
Planet Continent Country                    
Earth  Europe    England  0.3  0.5  0.6  0.8
                 Italy    0.1  0.2  0.4  1.2

but if you have a lot of rows available at once, the answer of @Yo_Chris will be way more easy

Sign up to request clarification or add additional context in comments.

1 Comment

I chose the other one as it matched better with how my code was already structured. Thanks nonetheless!
3
row1 =  np.array(['Earth', 'Europe', 'England', 0.3, 0.5, 0.6, 0.8])
row2 =  np.array(['Earth', 'Europe', 'Italy', 0.1, 0.2, 0.4, 1.2])
# create a data frame and set index
df = pd.DataFrame([row1, row2]).set_index([0,1,2])
# set the index names
df.index.names = ['Planet', 'Continent', 'Country']
# create a multi-index and assign to columns
df.columns = pd.MultiIndex.from_tuples([('X', 'A'), ('X', 'B'), ('Y', 'C'), ('Y', 'D')])

                            X         Y     
                            A    B    C    D
Planet Continent Country                    
Earth  Europe    England  0.3  0.5  0.6  0.8
                 Italy    0.1  0.2  0.4  1.2

1 Comment

Both answers are great, but this fits perfectly with how my code was designed already. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.