4

I want to add a new column in my data frame. I have a list of events and if any of these is different from 0 the value of the row in the new column should be 1.

I think it should be very simple, but i am fairly new to python.

The dataframe looks like this:

df = pd.DataFrame({"ID":[1,1,2,3],"Date":["01/01/2019","01/01/2019","02/01/2019","02/01/2019"],"Event_1":[1,0,0,0],"Event_2":[1,0,0,1],"Event_3":[0,1,0,1],"Other":[0,0,0,1]})

print(df)
ID    Date         Event_1 Event_2 Event_3 Other
1     01/01/2019   1       1       0       0
1     01/01/2019   0       0       1       0
2     02/01/2019   0       0       0       0
3     02/01/2019   0       1       1       1

And should look like this:

ID    Date         Event_1 Event_2 Event_3 Other Conditional_row
1     01/01/2019   1       1       0       0     1
1     01/01/2019   0       0       1       0     1
2     02/01/2019   0       0       0       0     0
3     02/01/2019   0       1       1       1     1

What is the easiest way of doing it? What is the best?

4 Answers 4

2

Use filter + any

Since all non-zero integers are Truthy in Python, calling any directly on your DataFrame results in the correct mask. Since you want an integer output, we can use a memory efficient view to view the boolean mask as a integer type.


df.filter(like="Event").any(1).view('i1')

0    1
1    1
2    0
3    1
dtype: int8
Sign up to request clarification or add additional context in comments.

1 Comment

Got it almost working. It doesn't raise an error now. But for some reason it sets all values to 0
2

Using DataFrame.filter, eq and any

First we filter the columns which start with Event or Other. Then we check if any of the rows are eq (equal) to 1:

df['Conditional_row'] = df.filter(regex="^Event|^Other").eq(1).any(axis=1).astype(int)
   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

2 Comments

I have a list of rows in: event_list = ("event_1","event_2","event_2","event_3","other") And when i substitute like='Event for event list i get: ValueError: cannot reindex from a duplicate axis
See my edit which includes checking for column Other as well. @JesperMølgaard
1

Suppose your data frame is stored in an object called df. I believe this is the most efficient way to do this:

df["Conditional_row"] = 0
df.loc[df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0, "Conditional_row"] = 1

The output looks like this:

print(df)
   ID        Date  Event_1  Event_2  Event_3  Other  Conditional_row
0   1  01/01/2019        1        1        0      0                1
1   1  01/01/2019        0        0        1      0                1
2   2  02/01/2019        0        0        0      0                0
3   3  02/01/2019        0        1        1      1                1

What I did here was:

  1. I created a new column filled with zeroes.
  2. I selected all the rows where the row-wise sum of the columns in the list ["Event_1","Event_2","Event_3","Other"] is greater than 1.
  3. The column "Conditional_row" of the rows that meet that condition are updated with the value 1.

The code df[["Event_1","Event_2","Event_3","Other"]].sum(axis=1) > 0 is called a mask and it returns a boolean array (a vector filled with True and False values). It selects all the rows where the return value is True. Typically, slicing using boolean arrays is the most efficient way to manipulate data frames.

Comments

1

Or use:

df['Conditional_row'] = df[['Event_1', 'Event_2', 'Event_3', 'Other']].ne(0).any(1).astype(int)

And now:

print(df)

Output:

   ID        Date  Event_1  Event_2  Event_3  Conditional_row
0   1  01/01/2019        1        1        0                1
1   1  01/01/2019        0        0        1                1
2   2  02/01/2019        0        0        0                0
3   3  02/01/2019        0        1        1                1

2 Comments

It looks like it could be easy to implement, but for me that raises a TypeError: Cannot convert bool to numpy.ndarray My list of rows is in: event_list = ("event_1","event_2","event_2","event_3","other") And i tried to substitute ['Event_1', 'Event_2', 'Event_3'] for event_list
@JesperMølgaard Added other

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.