How to use pandas to add new column using if statement?

Question

Could you kindly help me to write the following concept in python pandas, I have the following datatype:

id=["Train A","Train A","Train A","Train B","Train B","Train B"]
start = ["A","B","C","D","E","F"]
end = ["G","H","I","J","K","L"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:25:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
capacity = ["2","2","3","3","2","3"]

To obtain the following data:

id         arrival_time         departure_time         start  end  capacity

Train A          0                  2016-05-19 08:25:00   A     G    2
Train A   2016-05-19 13:50:00       2016-05-19 16:00:00   B     H    2
Train A   2016-05-19 21:25:00       2016-05-20 07:25:00   C     I    3
Train B          0                  2016-05-24 12:50:00   D     J    3
Train B   2016-05-24 18:30:00       2016-05-25 20:00:00   E     K    2
Train B   2016-05-26 12:15:00       2016-05-26 19:45:00   F     L    3

I would like to add a column called source and sink and if the time difference between arrival and departure is less than 3 hours, the source is the starting of the trip and the sink is only when the trip breaks (ie when time_difference is more than 3 hours,

time difference   source     sink
     -              A         H
     02:10:00       A         H
     10:00:00       C         I
     -              D         K
     01:30:00       D         K
     19:30:00       F         L

Does your if function need only information from the same row that it will ultimately update? If so, then using the "apply" function on the dataframe would work as per the answer here: stackoverflow.com/questions/26886653/… But I suspect the answer you're looking for requires some cross-row comparison, is that right? — Thomas Kimber
– Thomas Kimber, Commented May 10, 2017 at 15:08
apply is needlessly slow here. numpy.where would be better. df2 = df.assign(source_or_sink=numpy.where(<condition_for_source>, df['source'], df['sink']) — Paul H
– Paul H, Commented May 10, 2017 at 15:09
thank you , i will take a look at that ! yes, if statement need only information of time difference and not necessary to create a column. and true, it has to merge two row's data if if statement is satisfied — user7779326
– user7779326, Commented May 10, 2017 at 15:10

Scott Boston · Accepted Answer · 2017-05-10 15:31:30Z

2

df = df.assign(timediff=(df.departure_time - df.arrival_time))

df = df.assign(source = np.where(df.timediff.dt.seconds / 3600 < 3, df.shift(1).start, df.start))

df = df.assign(sink = np.where(df.timediff.dt.seconds.shift(1) / 3600 > 3, df.shift(-1).end, df.end))

print(df)

Output:

        id        arrival_time      departure_time start end  capacity sink  \
0  Train A                 NaT 2016-05-19 08:25:00     A   G         2    G   
1  Train A 2016-05-19 13:50:00 2016-05-19 16:00:00     B   H         2    H   
2  Train A 2016-05-19 21:25:00 2016-05-20 07:25:00     C   I         3    I   
3  Train B                 NaT 2016-05-24 12:50:00     D   J         3    K   
4  Train B 2016-05-24 18:30:00 2016-05-25 20:00:00     E   K         2    K   
5  Train B 2016-05-26 12:15:00 2016-05-26 19:45:00     F   L         3    L   

         timediff source  
0             NaT      A  
1 0 days 02:10:00      A  
2 0 days 10:00:00      C  
3             NaT      D  
4 1 days 01:30:00      D  
5 0 days 07:30:00      F

edited May 10, 2017 at 15:31

answered May 10, 2017 at 15:25

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

zipa Over a year ago

I'm sure you meant df.timediff

Collectives™ on Stack Overflow

How to use pandas to add new column using if statement?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related