Working with multi-index pandas dataframe

Question

I am working with a multi-index data frame but I am having a few problems while trying to filter/update its values.

What I need:

Change 'Name 1', 'Name 2' and the others to upper case
Get all the names with value 1 in {Group 1+ A} for example
Get the list of the names in the previous step after selection (NAME 1, NAME 2, etc)

If I could also convert this MultiIndex data frame into a "normal" data frame it would be fine too.

A sample code:

import pandas as pd

sample_file = '.../Sample.xlsx'

excel_file = pd.ExcelFile(sample_file)
df = excel_file.parse(header=[0, 1], index_col=[0], sheet_name=0)

# Upper case columns
c_cols = licensing_df.columns.get_level_values(0).str.upper()
s_cols = licensing_df.columns.get_level_values(1).str.upper()
licensing_df.columns = pd.MultiIndex.from_arrays([c_cols, s_cols])

# TODO: step 1

# Step 2
valid = df[df[('GROUP 1', 'A')] == 1]

# TODO: Step 3

This is the sample file I am using: Sample file

This is a sample picture of a data frame:

Thank you!

Can you add code to create your input dataframe? This way we can understand your exact dataframe structure. Also, add expected output from this dataframe. — Scott Boston
– Scott Boston, Commented Aug 5, 2019 at 14:23

Scott Boston · Accepted Answer · 2019-08-08 12:57:25Z

User your excel file:

df = pd.read_excel('Downloads/Sample.xlsx', header=[0,1], index_col=0)
df

Output:

Lists  Group 1                                         ... Group 2                                         
Name        AR   AZ   CA   CO  CT   FL  GA   IL IN KY  ...      SC  SD   TN   TX   UT   VA WA   WI  WV   WY
Name 1     NaN  1.0  1.0  1.0 NaN  1.0 NaN  NaN  1  1  ...       1 NaN  1.0  1.0  1.0  1.0  1  1.0 NaN  1.0
Name 2     NaN  NaN  NaN  NaN NaN  1.0 NaN  1.0  1  1  ...       1 NaN  1.0  NaN  NaN  1.0  1  NaN NaN  NaN
Name 3     NaN  NaN  NaN  NaN NaN  NaN NaN  1.0  1  1  ...       1 NaN  NaN  NaN  NaN  NaN  1  NaN NaN  NaN

[3 rows x 72 columns]

To Do #1

df.index = df.index.str.upper()
df

Output:

Lists  Group 1                                         ... Group 2                                         
Name        AR   AZ   CA   CO  CT   FL  GA   IL IN KY  ...      SC  SD   TN   TX   UT   VA WA   WI  WV   WY
NAME 1     NaN  1.0  1.0  1.0 NaN  1.0 NaN  NaN  1  1  ...       1 NaN  1.0  1.0  1.0  1.0  1  1.0 NaN  1.0
NAME 2     NaN  NaN  NaN  NaN NaN  1.0 NaN  1.0  1  1  ...       1 NaN  1.0  NaN  NaN  1.0  1  NaN NaN  NaN
NAME 3     NaN  NaN  NaN  NaN NaN  NaN NaN  1.0  1  1  ...       1 NaN  NaN  NaN  NaN  NaN  1  NaN NaN  NaN

[3 rows x 72 columns]

To Do #2

df[df.loc[:, ('Group 1', 'AZ')] == 1].index.to_list()

Output:

['NAME 1']

To Do #3

df[df.loc[:, ('Group 1', 'IL')] == 1].index.to_list()

Output:

['NAME 2', 'NAME 3']

McLovvin · Accepted Answer · 2019-08-05 15:49:46Z

1

I can only assume what you're trying to achieve since you did not provide an input sample.

If you're trying to select and modify a specific row with a MultIndex you can use the .loc operator and the corresponding tuple that you specified in the MultiIndex, e.g

df.loc['Name1', ('GROUP 1', 'A')]

Let's mock some data...

index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                    names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                      names=['subject', 'type'])
data=np.array(list(string.ascii_lowercase))[:24].reshape((4, 6))

df = pd.DataFrame(
    columns=columns,
    index=index,
    data=data
)

Here's our MultiIndex DataFrame:

subject    Bob      Guido      Sue     
type        HR Temp    HR Temp  HR Temp
year visit                             
2013 1       a    b     c    d   e    f
     2       g    h     i    j   k    l
2014 1       m    n     o    p   q    r
     2       s    t     u    v   w    x

Let's select the first row and change the letters to uppercase...

df.loc[(2013, 1)].str.upper()

...and likewise for the first column...

df.loc[('Bob', 'HR')].str.upper()

...and finally we pick a specific cell

df.loc[(2014, 1), ('Guido', 'HR')].upper()

which returns

'O'

I hope that gives you an idea of how to use the .loc operator....

answered Aug 5, 2019 at 15:49

McLovvin

811 silver badge1 bronze badge

2 Comments

briba Over a year ago

nice! haha thanks man! Helped a lot! I also updated my post with some more details. Super thanks!

briba Over a year ago

Let's say you index is like this: "index = pd.MultiIndex.from_product([['year 2013', 'year 2014'], [1, 2]], names=['year', 'visit'])" in this case how can I change to upper case?

Collectives™ on Stack Overflow

Working with multi-index pandas dataframe

2 Answers 2

To Do #1

To Do #2

To Do #3

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

To Do #1

To Do #2

To Do #3

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related