1

I am reading an excel file where the product and other labels (production per day, per month, etc...) are in the same column. I would like to create a new column and bring the product name on every row related to that product. Does anybody can support? Thanks in advance! :)

how it is:

8HP70 
Production/Day
Production/Month
Cum.Production
8HP70X 
Production/Day
Production/Month
Cum.Production
8HP75 
Production/Day
Production/Month
Cum.Production
**how I expect:**
Column A | Column B

8HP70 | Production/Day
8HP70 | Production/Month
8HP70 | Cum.Production
8HP70X | Production/Day
8HP70X | Production/Month
8HP70X | Cum.Production
8HP75 | Production/Day
8HP75 | Production/Month
8HP75 | Cum.Production
4
  • Please, provide the input in text format. Commented Jul 22, 2019 at 22:49
  • Is it always 3 afterwards? Commented Jul 22, 2019 at 22:51
  • Hello @EdekiOkoh, yes, in this report always 3 rows after the product name. Commented Jul 22, 2019 at 23:02
  • 1
    Hi @AlexandreB., I will provide it. Commented Jul 22, 2019 at 23:05

1 Answer 1

3

One example how this could be handled:

import pandas as pd
l = [
    ['8HP70'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
    ['8HP70X'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
    ['8HP75'],
    ['Production/Day'],
    ['Production/Month'],
    ['Cum.Production'],
]

df = pd.DataFrame(l, columns=['Column B'])

## repeating product label for every 4 rows
products = df[df['Column B'].index % 4 == 0]

## replicating to a new column
df['Column A'] = products.values.repeat(4)

## removing the product duplication
df = df[df['Column A']!=df['Column B']]

Out[3]: 
            Column B Column A
1     Production/Day    8HP70
2   Production/Month    8HP70
3     Cum.Production    8HP70
5     Production/Day   8HP70X
6   Production/Month   8HP70X
7     Cum.Production   8HP70X
9     Production/Day    8HP75
10  Production/Month    8HP75
11    Cum.Production    8HP75

EDIT

Added some more logic as further requested. If there are noisy rows before and all the way to the first product label, we can just remove, perform our logic and re-append (assuming we know the first product label):

df = pd.DataFrame(l, columns=['Column B'])


## Identify product starting location
prod_label = '8HP70'

## Get index of where first prod appear
prod_indic = df[df['Column B'] == prod_label].index[0]

## create a temp df only with product info
only_prod_df = df[df.index>=prod_indic].reset_index(drop=True)
products = only_prod_df[only_prod_df['Column B'].index % 4 == 0]

## replicating to a new column
only_prod_df['Column A'] = products.values.repeat(4)

## removing the product duplication
only_prod_df = only_prod_df[only_prod_df['Column A']!=only_prod_df['Column B']]

## append back to noisy rows
final_df = pd.concat([df[df.index<prod_indic], only_prod_df], 
                                  axis=0, sort=False, ignore_index=True)

            Column B Column A
0              noise      NaN
1              noise      NaN
2              noise      NaN
3     Production/Day    8HP70
4   Production/Month    8HP70
5     Cum.Production    8HP70
6     Production/Day   8HP70X
7   Production/Month   8HP70X
8     Cum.Production   8HP70X
9     Production/Day    8HP75
10  Production/Month    8HP75
11    Cum.Production    8HP75

Also important to note this piece relies on a sequential numeric index.

Sign up to request clarification or add additional context in comments.

4 Comments

Hello @calestini, first of all, thanks for your support. I am facing some errors which I believe they are because I have some lines/information before start with the first product. Is there any way to start the count from the first product without know the position it will start? Let's say, when find the first '8HP70' in that column, then consider the code above as you mentioned. Just to mention: Your code above is perfect and working fine. I am just trying to improve my knowledge trying to do the same without drop rows from the dataframe.
HI @ThiagoAraujo, absolutely. Is there a specific pattern to the first product? Would you know the label, or use a pattern to find out where it is?
Hey @Calestini, yes, I always know the first product. Your support was very very helpful. Thank you again.
Anytime @ThiagoAraujo, there are possible more effiicent ways of achieving the same thing, so I'd recommend to keep tinkering with pandas and its main methods. Glad it helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.