creating new column based on multiple conditional statements pandas dataframe

Question

I have a machine dataset with the below details.

Sample df:

Need to create a new column called "Quality Match", and that column indicates whether the current shift Planned Quality is the same as the actual Quality.

Below are the conditions.

1.)First of all, need to check the planned Quality is the same as the Actual Quality, if yes>>>Update "Quality Match" as 0.

2.) 2.1 If they are different from each other, need to check previous shift's actual quality is the same as the current actual quality and 2.2 if not need to check Planned Quality column and where are previous shift's Actual quality lastly located and get the all unique qualities after that to the current cell and check whether the current actual quality contains in that selected qualities.

if any of the 2.1 or 2.2 conditions satisfied>>>Update "Quality Match" as -1

3.) Otherwise, update "Quality Match" as 1

Ex: Please check cell 177, this shift's planned quality(Quality A) and Actual quality(Quality B) are different, then check the previous shift's Actual Quality(Quality C) its also not the current Actual quality(B), and then need to check Whether Before the current shifts' Planned Quality include Previous Shift Actual Quality(C), yes it is lastly situated at 166.then get the all the unique qualities till to the current cell(167 to 176), check that quality list contains current quality(Quality B), yes it is then updated "Quality Match" as -1.

Final Expected Output:

sample dataset:

# import pandas library
import pandas as pd
from pandas import Timestamp
# dictionary with list object in values
details ={'Machine': {164: 'M22',
  165: 'M22',
  166: 'M22',
  167: 'M22',
  168: 'M22',
  169: 'M22',
  170: 'M22',
  171: 'M22',
  172: 'M22',
  173: 'M22',
  174: 'M22',
  175: 'M22',
  176: 'M22',
  177: 'M22',
  178: 'M22',
  179: 'M22'},
 'Start Time': {164: Timestamp('2021-05-31 07:00:00'),
  165: Timestamp('2021-05-31 08:11:12'),
  166: Timestamp('2021-05-31 08:46:12'),
  167: Timestamp('2021-05-31 12:00:00'),
  168: Timestamp('2021-05-31 19:00:00'),
  169: Timestamp('2021-06-01 07:00:00'),
  170: Timestamp('2021-06-01 19:00:00'),
  171: Timestamp('2021-06-02 07:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-02 19:00:00'),
  174: Timestamp('2021-06-03 07:00:00'),
  175: Timestamp('2021-06-03 19:00:00'),
  176: Timestamp('2021-06-04 07:00:00'),
  177: Timestamp('2021-06-04 14:38:42'),
  178: Timestamp('2021-06-04 14:39:27'),
  179: Timestamp('2021-06-04 19:00:00')},
 'End Time': {164: Timestamp('2021-05-31 08:11:12'),
  165: Timestamp('2021-05-31 08:46:12'),
  166: Timestamp('2021-05-31 12:00:00'),
  167: Timestamp('2021-05-31 19:00:00'),
  168: Timestamp('2021-06-01 07:00:00'),
  169: Timestamp('2021-06-01 19:00:00'),
  170: Timestamp('2021-06-02 07:00:00'),
  171: Timestamp('2021-06-02 19:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-03 07:00:00'),
  174: Timestamp('2021-06-03 19:00:00'),
  175: Timestamp('2021-06-04 07:00:00'),
  176: Timestamp('2021-06-04 14:38:42'),
  177: Timestamp('2021-06-04 14:39:27'),
  178: Timestamp('2021-06-04 19:00:00'),
  179: Timestamp('2021-06-05 07:00:00')},
 'shift': {164: 'Day',
  165: 'Day',
  166: 'Day',
  167: 'Day',
  168: 'Night',
  169: 'Day',
  170: 'Night',
  171: 'Day',
  172: 'Night',
  173: 'Night',
  174: 'Day',
  175: 'Night',
  176: 'Day',
  177: 'Day',
  178: 'Day',
  179: 'Night'},
 'Planned Quality': {164: 'C',
  165: 'C',
  166: 'C',
  167: 'B',
  168: 'B',
  169: 'B',
  170: 'B',
  171: 'B',
  172: 'B',
  173: 'A',
  174: 'A',
  175: 'A',
  176: 'A',
  177: 'A',
  178: 'A',
  179: 'A'},
 'Actual Quality': {164: 'D',
  165: 'DEFAULT',
  166: 'C',
  167: 'C',
  168: 'C',
  169: 'C',
  170: 'C',
  171: 'C',
  172: 'C',
  173: 'C',
  174: 'C',
  175: 'C',
  176: 'C',
  177: 'B',
  178: 'A',
  179: 'A'},
 'Planned Shift Production': {164: 75.87,
  165: 317.29,
  166: 206.51,
  167: 54.88,
  168: 258.5,
  169: 658.5,
  170: 658.5,
  171: 658.5,
  172: 743.13,
  173: 329.25,
  174: 658.5,
  175: 658.5,
  176: 419.52,
  177: 0.69,
  178: 238.29,
  179: 658.5},
 'Actual Shift Production': {164: 4.16,
  165: 0.0,
  166: 158.81,
  167: 173.13,
  168: 596.4,
  169: 805.03,
  170: 107.26,
  171: 0.0,
  172: 0.0,
  173: 0.0,
  174: 0.0,
  175: 122.78,
  176: 3323.42,
  177: 0.0,
  178: 2284.28,
  179: 686.7}}        



  
# creating a Dataframe object 
df = pd.DataFrame(details)
  
df

My approach:

I tried to create a Quality Match column using np.select() but couldn't able to set the 2.2 conditions into my code.

Really appreciate your support !!!!!!!!!!!!

Georgy Kopshteyn · Accepted Answer · 2021-06-10 13:42:12Z

1

There may be more elegant solutions, but the following straightforward approach should do what you want:

machine_list = df["Machine"].unique().tolist()

for machine in machine_list:
    indices = df.index[df["Machine"]==machine].tolist()
    start_index = indices[0]
    end_index = indices[-1]

    for i, (planned, actual) in enumerate(zip(df.loc[start_index:,"Planned Quality"], df.loc[start_index:,"Actual Quality"]), start=start_index):
        if i > end_index:
            break
        if planned == actual:
            df.at[i, "Quality Match"] = 0
        elif i >= start_index + 1:
            if actual == df.at[i-1, "Actual Quality"]:
                df.at[i, "Quality Match"] = -1
            elif i-2  >= start_index:
                j = i-2
                lst = []
                while j >= start_index:
                    if df.at[j, "Planned Quality"] == df.at[i-1, "Actual Quality"]:
                        lst = [x for x in df.loc[j:i-1,"Planned Quality"]]
                        break
                    else:
                        j -= 1

                if actual in lst:
                    df.at[i, "Quality Match"] = -1
                else:
                    df.at[i, "Quality Match"] = 1
            else:
                df.at[i, "Quality Match"] = 1
        else:
            df.at[i, "Quality Match"] = 1

Note that, in my suggestion, I have assumed that your dataset is sorted by machine names.

edited Jun 10, 2021 at 13:42

answered Jun 10, 2021 at 11:38

Georgy Kopshteyn

7634 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

johnson Over a year ago

No Georgy, Had to this for a huge dataset with hundrend of machines also handcoding cell values is not bring the expected output. Program should be able to identify automatically whether there is any current quality in previous Planned Quality and then have to check the unique qualities. So as i mentioned program should be able to capture it automatically. Not using hard code values

Georgy Kopshteyn Over a year ago

@domahc You have been absolutely right about the hard-coded values, that was bad on my part. I have updated my snippet accordingly. Now, it can handle variable indices and works for datasets with different machines. Let me know whether this works for you.

johnson Over a year ago

at cell # 12, how does this work df.at[i, "Quality Match"] = 0, without creating Quality match columns previously?code work perfectly. But can you please elaborate on this? i search df.at at it update the df's existing columns specific given position. Without existing quality match how did this work?

Georgy Kopshteyn Over a year ago

@domahc using df.at[i, "Quality Match"] adds the "Quality Match" column automatically, if it does not yet exist. Otherwise, it performs the operation on the existing "Quality Match" column. The same would have been true for df.loc[i, "Quality Match"] or df["Quality Match"].

Collectives™ on Stack Overflow

creating new column based on multiple conditional statements pandas dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related