0

I have the following pandas dataframe (in code):

original_df = pd.DataFrame([['ID1', 4], ['ID1', 4], ['ID1', 4], ['ID2', 5], ['ID2', 5], ['ID2', 5], ['ID3', 6], ['ID3', 6], ['ID3', 6]], columns=['Index', 'Value'])

Based on each occurence of the change in Index value, only the first value column's integer should be maintained, and the rest of the values should be changed to 0 value.

The resulting manipulated dataframe should look like this:

desired_df = pd.DataFrame([['ID1', 4], ['ID1', 0], ['ID1', 0], ['ID2', 5], ['ID2', 0], ['ID2', 0], ['ID3', 6], ['ID3', 0], ['ID3', 0]], columns=['Index', 'Value'])

I have tried numerous different manipulation techniques, but none has worked. I have tried substitution of values, but this doesn't scale well for many rows at once. Please could someone provide any suggestions ?

1 Answer 1

1

Set 0 for all values if difference with next row Series.where:

mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = original_df['Value'].where(mask, 0)

Or use:

mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = np.where(mask, original_df['Value'], 0)

print (original_df)  
  Index  Value
0   ID1      4
1   ID1      0
2   ID1      0
3   ID2      5
4   ID2      0
5   ID2      0
6   ID3      6
7   ID3      0
8   ID3      0

If groups are sorted is possible use Series.duplicated:

mask = original_df['Index'].duplicated()
original_df['Value'] = original_df['Value'].mask(mask, 0)
original_df['Value'] = np.where(mask, 0, original_df['Value'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.