Pandas - Dataframe - Conditional add

Question

I want to add a new column in my data frame. I have a list of events and if any of these is different from 0 the value of the row in the new column should be 1.

I think it should be very simple, but i am fairly new to python.

The dataframe looks like this:

df = pd.DataFrame({"ID":[1,1,2,3],"Date":["01/01/2019","01/01/2019","02/01/2019","02/01/2019"],"Event_1":[1,0,0,0],"Event_2":[1,0,0,1],"Event_3":[0,1,0,1],"Other":[0,0,0,1]})

print(df)
ID    Date         Event_1 Event_2 Event_3 Other
1     01/01/2019   1       1       0       0
1     01/01/2019   0       0       1       0
2     02/01/2019   0       0       0       0
3     02/01/2019   0       1       1       1

And should look like this:

ID    Date         Event_1 Event_2 Event_3 Other Conditional_row
1     01/01/2019   1       1       0       0     1
1     01/01/2019   0       0       1       0     1
2     02/01/2019   0       0       0       0     0
3     02/01/2019   0       1       1       1     1

What is the easiest way of doing it? What is the best?

user3483203 · Accepted Answer · 2019-08-14 13:22:29Z

2

Use filter + any

Since all non-zero integers are Truthy in Python, calling any directly on your DataFrame results in the correct mask. Since you want an integer output, we can use a memory efficient view to view the boolean mask as a integer type.

df.filter(like="Event").any(1).view('i1')

0    1
1    1
2    0
3    1
dtype: int8

edited Aug 14, 2019 at 13:22

answered Aug 14, 2019 at 13:16

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jesper Mølgaard Over a year ago

Got it almost working. It doesn't raise an error now. But for some reason it sets all values to 0

Erfan · Accepted Answer · 2019-08-14 15:15:08Z

2

Using `DataFrame.filter`, `eq` and `any`

First we filter the columns which start with Event or Other. Then we check if any of the rows are eq (equal) to 1:

df['Conditional_row'] = df.filter(regex="^Event|^Other").eq(1).any(axis=1).astype(int)

   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

edited Aug 14, 2019 at 15:15

answered Aug 14, 2019 at 13:16

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

2 Comments

Jesper Mølgaard Over a year ago

I have a list of rows in: event_list = ("event_1","event_2","event_2","event_3","other") And when i substitute like='Event for event list i get: ValueError: cannot reindex from a duplicate axis

Erfan Over a year ago

See my edit which includes checking for column Other as well. @JesperMølgaard

Arturo Sbr · Accepted Answer · 2019-08-15 15:31:51Z

Suppose your data frame is stored in an object called df. I believe this is the most efficient way to do this:

df["Conditional_row"] = 0
df.loc[df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0, "Conditional_row"] = 1

The output looks like this:

print(df)
   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

What I did here was:

I created a new column filled with zeroes.
I selected all the rows where the row-wise sum of the columns in the list ["Event_1","Event_2","Event_3","Other"] is greater than 1.
The column "Conditional_row" of the rows that meet that condition are updated with the value 1.

The code df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0 is called a mask and it returns a boolean array (a vector filled with True and False values). It selects all the rows where the return value is True. Typically, slicing using boolean arrays is the most efficient way to manipulate data frames.

U13-Forward · Accepted Answer · 2019-08-15 06:19:49Z

1

Or use:

df['Conditional_row'] = df[['Event_1', 'Event_2', 'Event_3', 'Other']].ne(0).any(1).astype(int)

And now:

print(df)

Output:

   ID        Date  Event_1  Event_2  Event_3  Conditional_row
0   1  01/01/2019        1        1        0                1
1   1  01/01/2019        0        0        1                1
2   2  02/01/2019        0        0        0                0
3   3  02/01/2019        0        1        1                1

edited Aug 15, 2019 at 6:19

answered Aug 14, 2019 at 13:19

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

2 Comments

Jesper Mølgaard Over a year ago

It looks like it could be easy to implement, but for me that raises a TypeError: Cannot convert bool to numpy.ndarray My list of rows is in: event_list = ("event_1","event_2","event_2","event_3","other") And i tried to substitute ['Event_1', 'Event_2', 'Event_3'] for event_list

U13-Forward Over a year ago

@JesperMølgaard Added other

Collectives™ on Stack Overflow

Pandas - Dataframe - Conditional add

4 Answers 4

1 Comment

Using `DataFrame.filter`, `eq` and `any`

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Using DataFrame.filter, eq and any

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Using `DataFrame.filter`, `eq` and `any`