Structuring a dataframe with pandas

Question

I am reading an excel file where the product and other labels (production per day, per month, etc...) are in the same column. I would like to create a new column and bring the product name on every row related to that product. Does anybody can support? Thanks in advance! :)

how it is:

8HP70 
Production/Day
Production/Month
Cum.Production
8HP70X 
Production/Day
Production/Month
Cum.Production
8HP75 
Production/Day
Production/Month
Cum.Production

**how I expect:**

Column A | Column B

8HP70 | Production/Day
8HP70 | Production/Month
8HP70 | Cum.Production
8HP70X | Production/Day
8HP70X | Production/Month
8HP70X | Cum.Production
8HP75 | Production/Day
8HP75 | Production/Month
8HP75 | Cum.Production

Hello @EdekiOkoh, yes, in this report always 3 rows after the product name. — Thiago Araujo
– Thiago Araujo, Commented Jul 22, 2019 at 23:02

rrcal · Accepted Answer · 2019-07-23 17:01:46Z

3

One example how this could be handled:

import pandas as pd
l = [
    ['8HP70'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
    ['8HP70X'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
    ['8HP75'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
]

df = pd.DataFrame(l, columns=['Column B'])

## repeating product label for every 4 rows
products = df[df['Column B'].index % 4 == 0]

## replicating to a new column
df['Column A'] = products.values.repeat(4)

## removing the product duplication
df = df[df['Column A']!=df['Column B']]

Out[3]: 
            Column B Column A
1     Production/Day    8HP70
2   Production/Month    8HP70
3     Cum.Production    8HP70
5     Production/Day   8HP70X
6   Production/Month   8HP70X
7     Cum.Production   8HP70X
9     Production/Day    8HP75
10  Production/Month    8HP75
11    Cum.Production    8HP75

EDIT

Added some more logic as further requested. If there are noisy rows before and all the way to the first product label, we can just remove, perform our logic and re-append (assuming we know the first product label):

df = pd.DataFrame(l, columns=['Column B'])


## Identify product starting location
prod_label = '8HP70'

## Get index of where first prod appear
prod_indic = df[df['Column B'] == prod_label].index[0]

## create a temp df only with product info
only_prod_df = df[df.index>=prod_indic].reset_index(drop=True)
products = only_prod_df[only_prod_df['Column B'].index % 4 == 0]

## replicating to a new column
only_prod_df['Column A'] = products.values.repeat(4)

## removing the product duplication
only_prod_df = only_prod_df[only_prod_df['Column A']!=only_prod_df['Column B']]

## append back to noisy rows
final_df = pd.concat([df[df.index<prod_indic], only_prod_df], 
                                  axis=0, sort=False, ignore_index=True)

            Column B Column A
0              noise      NaN
1              noise      NaN
2              noise      NaN
3     Production/Day    8HP70
4   Production/Month    8HP70
5     Cum.Production    8HP70
6     Production/Day   8HP70X
7   Production/Month   8HP70X
8     Cum.Production   8HP70X
9     Production/Day    8HP75
10  Production/Month    8HP75
11    Cum.Production    8HP75

Also important to note this piece relies on a sequential numeric index.

edited Jul 23, 2019 at 17:01

answered Jul 22, 2019 at 23:06

rrcal

3,7906 gold badges27 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Thiago Araujo Over a year ago

Hello @calestini, first of all, thanks for your support. I am facing some errors which I believe they are because I have some lines/information before start with the first product. Is there any way to start the count from the first product without know the position it will start? Let's say, when find the first '8HP70' in that column, then consider the code above as you mentioned. Just to mention: Your code above is perfect and working fine. I am just trying to improve my knowledge trying to do the same without drop rows from the dataframe.

rrcal Over a year ago

HI @ThiagoAraujo, absolutely. Is there a specific pattern to the first product? Would you know the label, or use a pattern to find out where it is?

Thiago Araujo Over a year ago

Hey @Calestini, yes, I always know the first product. Your support was very very helpful. Thank you again.

rrcal Over a year ago

Anytime @ThiagoAraujo, there are possible more effiicent ways of achieving the same thing, so I'd recommend to keep tinkering with pandas and its main methods. Glad it helped.

Collectives™ on Stack Overflow

Structuring a dataframe with pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related