How do I create sub-headers in python dataframe?

Question

I have a dataframe say:

Example:


import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%','%2019','Janu%ary'], 
                   'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
    Item  col1 col2
  1  California  0    0
  2  2012%  50   50
  3  2013%  40   40
  4  Arizona  0   0
  5  2012%.    10. 15
  6. %2019.    11. 13
  7. Janu%ary.  14. 15

I want the column names like " California" and "Arizona" (the ones that do not have "%" in the column values to be considered as Headers that has to be appended to their respective sub-headers. Like maybe iterate down the rows and find a pattern e.g. without ‘%’ in row means its a header, with ‘%’ means its a sub-header then for the ‘sub-header’ rows, add the last found ‘header’.

 Expected output=
    Item  col1 col2
  
  1  California 2012%  50   50
  2  California 2013%  40   40
  3  Arizona 2012%.    10. 15
  4  Arizona 2019%.    11. 13
  5 Arizona January%.  14. 15

the provided input does not match the text input, you have to clarify — mozway
– mozway, Commented Mar 16, 2022 at 12:38
There is a syntax error in the input data. The correct input data for example provided could be the following: {"Item": ["California%", "2012%", "2013%", "Arizona", "2012%", "2019%", "January%"], "col1": [0, 20, 50, 40, 0, 10, 11], "col2": [0, 20, 50, 40, 0, 15, 13]} — gremur
– gremur, Commented Mar 16, 2022 at 12:58

mozway · Accepted Answer · 2022-03-16 12:51:16Z

2

IIUC, you could use a mask and perform boolean masking/indexing:

# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() + ' ' + df['Item']

# drop the former header rows
df = df.loc[m]

output:

               Item  col1  col2
1  California 2012%    50    50
2  California 2013%    50    40
4     Arizona 2012%    10    15
5     Arizona 2019%    11    13
6  Arizona January%    14    15

alternative to have a real index:

m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()

df = df.loc[m].set_index('index')

output:

                Item  col1  col2
index                           
California     2012%    50    50
California     2013%    50    40
Arizona        2012%    10    15
Arizona        2019%    11    13
Arizona     January%    14    15

edited Mar 16, 2022 at 12:51

answered Mar 16, 2022 at 12:41

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Shruthi Ravishankar Over a year ago

Not just ends with %, the "%" can be anywhere. Let me edit the question.

mozway Over a year ago

@ShruthiRavishankar then use str.contains, see update

Collectives™ on Stack Overflow

How do I create sub-headers in python dataframe?

1 Answer 1

alternative to have a real index:

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

alternative to have a real index:

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related