1

I have a machine dataset with the below details.

Sample df:

enter image description here

Need to create a new column called "Quality Match", and that column indicates whether the current shift Planned Quality is the same as the actual Quality.

Below are the conditions.

1.)First of all, need to check the planned Quality is the same as the Actual Quality, if yes>>>Update "Quality Match" as 0.

2.) 2.1 If they are different from each other, need to check previous shift's actual quality is the same as the current actual quality and 2.2 if not need to check Planned Quality column and where are previous shift's Actual quality lastly located and get the all unique qualities after that to the current cell and check whether the current actual quality contains in that selected qualities.

if any of the 2.1 or 2.2 conditions satisfied>>>Update "Quality Match" as -1

3.) Otherwise, update "Quality Match" as 1

Ex: Please check cell 177, this shift's planned quality(Quality A) and Actual quality(Quality B) are different, then check the previous shift's Actual Quality(Quality C) its also not the current Actual quality(B), and then need to check Whether Before the current shifts' Planned Quality include Previous Shift Actual Quality(C), yes it is lastly situated at 166.then get the all the unique qualities till to the current cell(167 to 176), check that quality list contains current quality(Quality B), yes it is then updated "Quality Match" as -1.

Final Expected Output:

enter image description here

sample dataset:

# import pandas library
import pandas as pd
from pandas import Timestamp
# dictionary with list object in values
details ={'Machine': {164: 'M22',
  165: 'M22',
  166: 'M22',
  167: 'M22',
  168: 'M22',
  169: 'M22',
  170: 'M22',
  171: 'M22',
  172: 'M22',
  173: 'M22',
  174: 'M22',
  175: 'M22',
  176: 'M22',
  177: 'M22',
  178: 'M22',
  179: 'M22'},
 'Start Time': {164: Timestamp('2021-05-31 07:00:00'),
  165: Timestamp('2021-05-31 08:11:12'),
  166: Timestamp('2021-05-31 08:46:12'),
  167: Timestamp('2021-05-31 12:00:00'),
  168: Timestamp('2021-05-31 19:00:00'),
  169: Timestamp('2021-06-01 07:00:00'),
  170: Timestamp('2021-06-01 19:00:00'),
  171: Timestamp('2021-06-02 07:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-02 19:00:00'),
  174: Timestamp('2021-06-03 07:00:00'),
  175: Timestamp('2021-06-03 19:00:00'),
  176: Timestamp('2021-06-04 07:00:00'),
  177: Timestamp('2021-06-04 14:38:42'),
  178: Timestamp('2021-06-04 14:39:27'),
  179: Timestamp('2021-06-04 19:00:00')},
 'End Time': {164: Timestamp('2021-05-31 08:11:12'),
  165: Timestamp('2021-05-31 08:46:12'),
  166: Timestamp('2021-05-31 12:00:00'),
  167: Timestamp('2021-05-31 19:00:00'),
  168: Timestamp('2021-06-01 07:00:00'),
  169: Timestamp('2021-06-01 19:00:00'),
  170: Timestamp('2021-06-02 07:00:00'),
  171: Timestamp('2021-06-02 19:00:00'),
  172: Timestamp('2021-06-02 19:00:00'),
  173: Timestamp('2021-06-03 07:00:00'),
  174: Timestamp('2021-06-03 19:00:00'),
  175: Timestamp('2021-06-04 07:00:00'),
  176: Timestamp('2021-06-04 14:38:42'),
  177: Timestamp('2021-06-04 14:39:27'),
  178: Timestamp('2021-06-04 19:00:00'),
  179: Timestamp('2021-06-05 07:00:00')},
 'shift': {164: 'Day',
  165: 'Day',
  166: 'Day',
  167: 'Day',
  168: 'Night',
  169: 'Day',
  170: 'Night',
  171: 'Day',
  172: 'Night',
  173: 'Night',
  174: 'Day',
  175: 'Night',
  176: 'Day',
  177: 'Day',
  178: 'Day',
  179: 'Night'},
 'Planned Quality': {164: 'C',
  165: 'C',
  166: 'C',
  167: 'B',
  168: 'B',
  169: 'B',
  170: 'B',
  171: 'B',
  172: 'B',
  173: 'A',
  174: 'A',
  175: 'A',
  176: 'A',
  177: 'A',
  178: 'A',
  179: 'A'},
 'Actual Quality': {164: 'D',
  165: 'DEFAULT',
  166: 'C',
  167: 'C',
  168: 'C',
  169: 'C',
  170: 'C',
  171: 'C',
  172: 'C',
  173: 'C',
  174: 'C',
  175: 'C',
  176: 'C',
  177: 'B',
  178: 'A',
  179: 'A'},
 'Planned Shift Production': {164: 75.87,
  165: 317.29,
  166: 206.51,
  167: 54.88,
  168: 258.5,
  169: 658.5,
  170: 658.5,
  171: 658.5,
  172: 743.13,
  173: 329.25,
  174: 658.5,
  175: 658.5,
  176: 419.52,
  177: 0.69,
  178: 238.29,
  179: 658.5},
 'Actual Shift Production': {164: 4.16,
  165: 0.0,
  166: 158.81,
  167: 173.13,
  168: 596.4,
  169: 805.03,
  170: 107.26,
  171: 0.0,
  172: 0.0,
  173: 0.0,
  174: 0.0,
  175: 122.78,
  176: 3323.42,
  177: 0.0,
  178: 2284.28,
  179: 686.7}}        



  
# creating a Dataframe object 
df = pd.DataFrame(details)
  
df

My approach:

I tried to create a Quality Match column using np.select() but couldn't able to set the 2.2 conditions into my code.

Really appreciate your support !!!!!!!!!!!!

1 Answer 1

1

There may be more elegant solutions, but the following straightforward approach should do what you want:

machine_list = df["Machine"].unique().tolist()

for machine in machine_list:
    indices = df.index[df["Machine"]==machine].tolist()
    start_index = indices[0]
    end_index = indices[-1]

    for i, (planned, actual) in enumerate(zip(df.loc[start_index:,"Planned Quality"], df.loc[start_index:,"Actual Quality"]), start=start_index):
        if i > end_index:
            break
        if planned == actual:
            df.at[i, "Quality Match"] = 0
        elif i >= start_index + 1:
            if actual == df.at[i-1, "Actual Quality"]:
                df.at[i, "Quality Match"] = -1
            elif i-2  >= start_index:
                j = i-2
                lst = []
                while j >= start_index:
                    if df.at[j, "Planned Quality"] == df.at[i-1, "Actual Quality"]:
                        lst = [x for x in df.loc[j:i-1,"Planned Quality"]]
                        break
                    else:
                        j -= 1

                if actual in lst:
                    df.at[i, "Quality Match"] = -1
                else:
                    df.at[i, "Quality Match"] = 1
            else:
                df.at[i, "Quality Match"] = 1
        else:
            df.at[i, "Quality Match"] = 1

Note that, in my suggestion, I have assumed that your dataset is sorted by machine names.

Sign up to request clarification or add additional context in comments.

4 Comments

No Georgy, Had to this for a huge dataset with hundrend of machines also handcoding cell values is not bring the expected output. Program should be able to identify automatically whether there is any current quality in previous Planned Quality and then have to check the unique qualities. So as i mentioned program should be able to capture it automatically. Not using hard code values
@domahc You have been absolutely right about the hard-coded values, that was bad on my part. I have updated my snippet accordingly. Now, it can handle variable indices and works for datasets with different machines. Let me know whether this works for you.
at cell # 12, how does this work df.at[i, "Quality Match"] = 0, without creating Quality match columns previously?code work perfectly. But can you please elaborate on this? i search df.at at it update the df's existing columns specific given position. Without existing quality match how did this work?
@domahc using df.at[i, "Quality Match"] adds the "Quality Match" column automatically, if it does not yet exist. Otherwise, it performs the operation on the existing "Quality Match" column. The same would have been true for df.loc[i, "Quality Match"] or df["Quality Match"].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.