0

I am working with a multi-index data frame but I am having a few problems while trying to filter/update its values.

What I need:

  1. Change 'Name 1', 'Name 2' and the others to upper case
  2. Get all the names with value 1 in {Group 1+ A} for example
  3. Get the list of the names in the previous step after selection (NAME 1, NAME 2, etc)

If I could also convert this MultiIndex data frame into a "normal" data frame it would be fine too.

A sample code:

import pandas as pd

sample_file = '.../Sample.xlsx'

excel_file = pd.ExcelFile(sample_file)
df = excel_file.parse(header=[0, 1], index_col=[0], sheet_name=0)

# Upper case columns
c_cols = licensing_df.columns.get_level_values(0).str.upper()
s_cols = licensing_df.columns.get_level_values(1).str.upper()
licensing_df.columns = pd.MultiIndex.from_arrays([c_cols, s_cols])

# TODO: step 1

# Step 2
valid = df[df[('GROUP 1', 'A')] == 1]

# TODO: Step 3

This is the sample file I am using: Sample file

This is a sample picture of a data frame:

enter image description here

Thank you!

1
  • 1
    Can you add code to create your input dataframe? This way we can understand your exact dataframe structure. Also, add expected output from this dataframe. Commented Aug 5, 2019 at 14:23

2 Answers 2

1

User your excel file:

df = pd.read_excel('Downloads/Sample.xlsx', header=[0,1], index_col=0)
df

Output:

Lists  Group 1                                         ... Group 2                                         
Name        AR   AZ   CA   CO  CT   FL  GA   IL IN KY  ...      SC  SD   TN   TX   UT   VA WA   WI  WV   WY
Name 1     NaN  1.0  1.0  1.0 NaN  1.0 NaN  NaN  1  1  ...       1 NaN  1.0  1.0  1.0  1.0  1  1.0 NaN  1.0
Name 2     NaN  NaN  NaN  NaN NaN  1.0 NaN  1.0  1  1  ...       1 NaN  1.0  NaN  NaN  1.0  1  NaN NaN  NaN
Name 3     NaN  NaN  NaN  NaN NaN  NaN NaN  1.0  1  1  ...       1 NaN  NaN  NaN  NaN  NaN  1  NaN NaN  NaN

[3 rows x 72 columns]

To Do #1

df.index = df.index.str.upper()
df

Output:

Lists  Group 1                                         ... Group 2                                         
Name        AR   AZ   CA   CO  CT   FL  GA   IL IN KY  ...      SC  SD   TN   TX   UT   VA WA   WI  WV   WY
NAME 1     NaN  1.0  1.0  1.0 NaN  1.0 NaN  NaN  1  1  ...       1 NaN  1.0  1.0  1.0  1.0  1  1.0 NaN  1.0
NAME 2     NaN  NaN  NaN  NaN NaN  1.0 NaN  1.0  1  1  ...       1 NaN  1.0  NaN  NaN  1.0  1  NaN NaN  NaN
NAME 3     NaN  NaN  NaN  NaN NaN  NaN NaN  1.0  1  1  ...       1 NaN  NaN  NaN  NaN  NaN  1  NaN NaN  NaN

[3 rows x 72 columns]

To Do #2

df[df.loc[:, ('Group 1', 'AZ')] == 1].index.to_list()

Output:

['NAME 1']

To Do #3

df[df.loc[:, ('Group 1', 'IL')] == 1].index.to_list()

Output:

['NAME 2', 'NAME 3']
Sign up to request clarification or add additional context in comments.

1 Comment

You are brilliant!
1

I can only assume what you're trying to achieve since you did not provide an input sample.

If you're trying to select and modify a specific row with a MultIndex you can use the .loc operator and the corresponding tuple that you specified in the MultiIndex, e.g

df.loc['Name1', ('GROUP 1', 'A')]

Let's mock some data...

index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                    names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                      names=['subject', 'type'])
data=np.array(list(string.ascii_lowercase))[:24].reshape((4, 6))

df = pd.DataFrame(
    columns=columns,
    index=index,
    data=data
)

Here's our MultiIndex DataFrame:

subject    Bob      Guido      Sue     
type        HR Temp    HR Temp  HR Temp
year visit                             
2013 1       a    b     c    d   e    f
     2       g    h     i    j   k    l
2014 1       m    n     o    p   q    r
     2       s    t     u    v   w    x

Let's select the first row and change the letters to uppercase...

df.loc[(2013, 1)].str.upper()

...and likewise for the first column...

df.loc[('Bob', 'HR')].str.upper()

...and finally we pick a specific cell

df.loc[(2014, 1), ('Guido', 'HR')].upper()

which returns

'O'

I hope that gives you an idea of how to use the .loc operator....

2 Comments

nice! haha thanks man! Helped a lot! I also updated my post with some more details. Super thanks!
Let's say you index is like this: "index = pd.MultiIndex.from_product([['year 2013', 'year 2014'], [1, 2]], names=['year', 'visit'])" in this case how can I change to upper case?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.