0

Edited

I'm sorry I didn't post it right the first time. The solutions suggested worked if there were only two entries of each Lead ID with Lead Status "A". I am changing my data. I again apologize.

Data:

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     0 
1-1H9G33C   A               -0.500709     0
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     0 
4-1HLQ2IJ   A                0.130818     0
4-1HLQ2IJ   A               -0.400817     0
4-1HLQ2IJ   A                0.240818     0

I want to accomplish following:

If there is a duplicate in the Lead ID and Lead Status, make all the Target values "1" for that LeadID with shorter Duration.

Desired Output

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     1 
1-1H9G33C   A               -0.500709     1
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     1 
4-1HLQ2IJ   A                0.130818     1
4-1HLQ2IJ   A               -0.400817     1
4-1HLQ2IJ   A                0.240818     0

I am not able to implement a condition of checking for duplicates and the value in duration to update the last column. I appreciate any assistance a lot.

2
  • You want all but the largest of the group to be 1? And the largest to be 0? Commented Sep 12, 2017 at 21:07
  • Yes, @TedPetrou Commented Sep 12, 2017 at 21:08

2 Answers 2

1

Try this(assuming your df is sorted)

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep=False)].drop_duplicates(['LeadID','LeadStatus'],keep='first').index,'Target']=1
df
Out[895]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A     0.337       0
3  4-1HFORF8      NoFit    -0.344       1
4  4-1HFSXOG      NoFit    -0.125       1
5  4-1HLQ2IJ          A    -0.331       1
6  4-1HLQ2IJ          A     0.241       0

Update


df=df.sort_values(['LeadID','LeadStatus','Duration'])

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep='last')].index,'Target']=1

Out[911]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A    -0.501       1
3  1-1H9G33C          A     0.337       0
4  4-1HFORF8      NoFit    -0.344       1
5  4-1HFSXOG      NoFit    -0.125       1
8  4-1HLQ2IJ          A    -0.401       1
6  4-1HLQ2IJ          A    -0.331       1
7  4-1HLQ2IJ          A     0.131       1
9  4-1HLQ2IJ          A     0.241       0
Sign up to request clarification or add additional context in comments.

4 Comments

@KrishnangKDalal This only works if you assume that the unique rows have 1 already.
There's also a much simpler way to do this instead of .loc. Just add the inner dropduplicated code to target. See my answer
Theres no need to sort by all those columns either. The automatic alignment of indexes will take care of the proper place for the data.
@TedPetrou: yes, I am aware and I have created the target column with just 1 and then changing them 0 according to the solution
0

Here is an idiomatic and performant answer.

df['Target'] += df.sort_values('Duration')\
                  .duplicated(subset=['Lead ID', 'Lead Status'], keep='last')

If you don't assume unique rows have a 1 then you can do the following.

df1 = df.sort_values('Duration')
unique = ~df1.duplicated(subset=['Lead ID', 'Lead Status'], keep=False) * 1
first = df1.duplicated(subset=['Lead ID', 'Lead Status'], keep='last') * 1
df['Target'] = unique + first

And a less performant way:

df.groupby(['Lead ID', 'Lead Status'])['Duration']\
  .transform(lambda x: 1 if len(x) == 1 else x < x.max())

     Lead ID Lead Status  Duration  Target
0  1-1H9C0XL   Too Small -0.466177       1
1  1-1H9G33C           A -0.620709       1
2  1-1H9G33C           A -0.500709       0
3  1-1H9G33C           A  0.337401       1
4  4-1HFORF8      No Fit -0.343840       1
5  4-1HFSXOG      No Fit -0.124920       1
6  4-1HLQ2IJ           A -0.330962       1
7  4-1HLQ2IJ           A  0.130818       1
8  4-1HLQ2IJ           A -0.400817       1
9  4-1HLQ2IJ           A  0.240818       0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.