2

I am new to pandas module. And have a quick question on data manipulation:

Suppose I have a table as follows:

Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
  M1 |     1      |   good |     85
  M1 |     4      |   bad  |     75
  M1 |     7      |   good |     90

Based on the condition in Status, I would like to add percentage.

For example:

  1. if the status is "good", then the following rows for subsequent week numbers should be all 100, i.e., next rows should be for weeks 2 and 3 with 100%

  2. if the status is 'bad', the percentage should be 0 for the following week numbers, i.e., 0 for the weeks 5 and 6.

I have some idea on how to approach with the condition, but no idea to add rows:

import os, re
import pandas as pd
df = pd.read_excel("test.xlsx")

add_rows = []
for elem in df.Status:
    if elem == "good":
        add_rows.append(100)
    if elem == "bad":
        add_rows.append(0)

df.Percent = pd.Series(add_rows)

However, this only gives me three values based on the condition, and changes the values of specific week numbers. But I want the following:

Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
  M1 |     1      |   good |     85
  M1 |     2      |   good |     100
  M1 |     3      |   good |     100
  M1 |     4      |   bad  |     75
  M1 |     5      |   bad  |      0
  M1 |     6      |   bad  |      0
  M1 |     7      |   good |     90
1
  • set_index\ and reindex Commented Sep 15, 2017 at 0:11

5 Answers 5

2

Here's another one

val = pd.DataFrame({'WeekNumber':np.arange(df['WeekNumber'].min(), df['WeekNumber'].max()+ 1, 1)})
new_df = df.merge(val, on='WeekNumber', how = 'outer').sort_values(by = 'WeekNumber').reset_index(drop = True)
new_df[['Tool', 'Status']] = new_df[['Tool', 'Status']].ffill()
new_df['Percentage'] = np.where((new_df['Status'] == 'good') & 
new_df['Percentage'].isnull(), 100, new_df['Percentage'])
new_df['Percentage'] = new_df['Percentage'].fillna(0)

You get

    Tool    WeekNumber  Status  Percentage
0   M1      1           good    85.0
1   M1      2           good    100.0
2   M1      3           good    100.0
3   M1      4           bad     75.0
4   M1      5           bad     0.0
5   M1      6           bad     0.0
6   M1      7           good    90.0
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help, Vaishali. It does work, but it is valid for one machine. What if we have multiple tools - M1, M2, can we include that?
Can you elaborate?
1

You can use .iterrows() to iterate through each row.

for index, row in df.iterrows():
    print row.Status

>>> good
>>> bad
>>> good

If I needed it to work using some rough code, I'd use my code:

new_index = 0
new_dict = {}
for index, row in df.iterrows():
    use_index = index + new_index

    new_row[use_index] = {}

    new_row[use_index]= {
        'Tool': row.Tool,
        'WeekNumber': row.WeekNumber,
        'Status': row.Status,
        'Percentage': row.Percentage,
    }

    if row.Percentage == 100:
        for n in range(2):
            add_index = index + 1 + new_index

            new_dict[add_index] = {}

            new_row[add_index]= {
                'Tool': 'M1',
                'WeekNumber': row.WeekNumber + n,
                'Status': 'good',
                'Percentage': 100,
            }

            new_index += 1

df = pd.DataFrame(new_dict)

Comments

1

Do your answer like this:

add_rows = []
for index, elem in enumerate(df.Status):
    if elem == "good":

        # assuming data is sorted by 'WeekNumber'
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 2}) # etc
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 3}) # etc

more_data = pd.DataFrame(add_rows)
df = pd.concat([df, more_data]).sort_values(by='WeekNumber')

Comments

0

Try this ?

df=df.set_index('WeekNumber').reindex(range(1,8))
df.Tool.fillna('M1',inplace=True)
df.Status=df.Status.ffill()
df.Percentage.fillna(0,inplace=True)
df.Percentage=np.where((df.Status=='good')&(df.Percentage==0),100,df.Percentage)
df.reset_index()


Out[80]: 
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0

Comments

0

You could do expand the dataframe first with set_index and reindex and fill the NaNs in Tool and Status

In [814]: dff = (df.set_index('WeekNumber')
                   .reindex(range(df.WeekNumber.min(), df.WeekNumber.max()+1))
                   .assign(Tool=lambda x: x.Tool.ffill(),
                           Status=lambda x: x.Status.ffill()))

In [815]: dff
Out[815]:
           Tool Status  Percentage
WeekNumber
1            M1   good        85.0
2            M1   good         NaN
3            M1   good         NaN
4            M1    bad        75.0
5            M1    bad         NaN
6            M1    bad         NaN
7            M1   good        90.0

Then, conditionally, fill Percentage the values

In [816]: dff.loc[(dff.Status == 'good') & dff.Percentage.isnull(), 'Percentage'] = 100

In [817]: dff.loc[(dff.Status == 'bad') & dff.Percentage.isnull(), 'Percentage'] = 0

Finally use reset_index()

In [818]: dff.reset_index()
Out[818]:
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0
5           6   M1    bad         0.0
6           7   M1   good        90.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.