2

I have this pandas dataframe:

df = pd.DataFrame(
    {
    "col1": [1,1,2,3,3,3,4,5,5,5,5]
    }
)
df

enter image description here

I want to add another column that says "last" if the value in col1 doesnt equal the value of col1 in the next row. This is how it should look like:

enter image description here

So far, I can create a column that contains True when if the value in col1 doesnt equal the value of col1 in the next row; and False otherwise:

df["last_row"] = df["col1"].shift(-1)
df['last'] = df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

enter image description here

Now something like

df["last_row"] = df["col1"].shift(-1)
df['last'] = "last" if df["col1"] != df["last_row"]
df = df.drop(["last_row"], axis=1)
df

would be nice, but this is apparently the wrong syntax. How can I manage to do this?


Ultimatly, I also want to add numbers that indicate how many time a value appear before this while the last value is always marked with "last". It should look like this:

enter image description here

I'm not sure if this is another step in my development or if this requires a new approach. I read that if I want to loop through an array while modifying values, I should use apply(). However, I don't know how to include conditions in this. Can you help me?

Thanks a lot!

2
  • For what it's worth, it's generally not recommended to mix types (string and int in this case) in Pandas dataframes. You lose out on a lot of performance that way. Commented Apr 26, 2019 at 15:43
  • For the first part, you're so close since you have constructed a boolean Series already. Construct an empty column, now you can do: df['last'][df['col1'] != df['last_row']] = 'last'. Commented Apr 26, 2019 at 15:48

4 Answers 4

3

Here's one way. You can obtain a cumulative count based on whether or not the next value in col1 is the same as that of the current row, defining a custom grouper, and taking the DataFrameGroupBy.cumsum. Then add last using a similar criteria using df.shift:

g = df.col1.ne(df.col1.shift(1)).cumsum()
df['update'] = df.groupby(g).cumcount()
ix = df[df.col1.ne(df.col1.shift(-1))].index
# Int64Index([1, 2, 5, 6, 10], dtype='int64')
df.loc[ix,'update'] = 'last'

 col1 update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last
Sign up to request clarification or add additional context in comments.

1 Comment

Works great, thank you! As g = df.col1, line 1 can be deleted and line 2 be replaced by df['update'] = df.groupby(df.col1).cumcount()
2

considering that the index is incremental, (1) cuncount each group, then take (2)max index inside each group and set the string

group = df.groupby('col1')

df['last'] = group.cumcount()
df.loc[group['last'].idxmax(), 'last'] = 'last'
#or df.loc[group.apply(lambda x: x.index.max()), 'last'] = 'last'


    col1    last
0   1   0
1   1   last
2   2   last
3   3   0
4   3   1
5   3   last
6   4   last
7   5   0
8   5   1
9   5   2
10  5   last

3 Comments

I made this the accepted answer as it was the most straight foreward to me. Thanks!
type(df.last) tells me that this column is a method. How do I convert it to a pandas.core.series.Series (like col1 is)?
@Julian He is a Series, try type(df['last'])
2

Use .shift to find where things change. Then you can use .where to mask appropriately then .fillna

s = df.col1 != df.col1.shift(-1)
df['Update'] = df.groupby(s.cumsum().where(~s)).cumcount().where(~s).fillna('last')

Output:

    col1 Update
0      1      0
1      1   last
2      2   last
3      3      0
4      3      1
5      3   last
6      4   last
7      5      0
8      5      1
9      5      2
10     5   last

As an aside, update is a method of DataFrames, so you should avoid naming a column 'update'

1 Comment

Works fine! What exactly does where(~s) do though?
1

Another possible solution.

df['update'] = np.where(df['col1'].ne(df['col1'].shift(-1)), 'last', 0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.