1

I have a dataframe say:

Example:


import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%','%2019','Janu%ary'], 
                   'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
    Item  col1 col2
  1  California  0    0
  2  2012%  50   50
  3  2013%  40   40
  4  Arizona  0   0
  5  2012%.    10. 15
  6. %2019.    11. 13
  7. Janu%ary.  14. 15

I want the column names like " California" and "Arizona" (the ones that do not have "%" in the column values to be considered as Headers that has to be appended to their respective sub-headers. Like maybe iterate down the rows and find a pattern e.g. without ‘%’ in row means its a header, with ‘%’ means its a sub-header then for the ‘sub-header’ rows, add the last found ‘header’.

 Expected output=
    Item  col1 col2
  
  1  California 2012%  50   50
  2  California 2013%  40   40
  3  Arizona 2012%.    10. 15
  4  Arizona 2019%.    11. 13
  5 Arizona January%.  14. 15

5
  • the provided input does not match the text input, you have to clarify Commented Mar 16, 2022 at 12:38
  • @mozway I have corrected it Commented Mar 16, 2022 at 12:39
  • Also the California/California% Commented Mar 16, 2022 at 12:39
  • @mozway Yup! I fixed it Commented Mar 16, 2022 at 12:41
  • There is a syntax error in the input data. The correct input data for example provided could be the following: {"Item": ["California%", "2012%", "2013%", "Arizona", "2012%", "2019%", "January%"], "col1": [0, 20, 50, 40, 0, 10, 11], "col2": [0, 20, 50, 40, 0, 15, 13]} Commented Mar 16, 2022 at 12:58

1 Answer 1

2

IIUC, you could use a mask and perform boolean masking/indexing:

# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() + ' ' + df['Item']

# drop the former header rows
df = df.loc[m]

output:

               Item  col1  col2
1  California 2012%    50    50
2  California 2013%    50    40
4     Arizona 2012%    10    15
5     Arizona 2019%    11    13
6  Arizona January%    14    15
alternative to have a real index:
m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()

df = df.loc[m].set_index('index')

output:

                Item  col1  col2
index                           
California     2012%    50    50
California     2013%    50    40
Arizona        2012%    10    15
Arizona        2019%    11    13
Arizona     January%    14    15
Sign up to request clarification or add additional context in comments.

2 Comments

Not just ends with %, the "%" can be anywhere. Let me edit the question.
@ShruthiRavishankar then use str.contains, see update

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.