Python filter row with multiple columns conditions

Question

I have a CSV dataset and I need to filter it with conditions but the problem is that the condition can be true for multiple days. What I want is to keep the last true value for this condition.

My dataset looks like this

Date           City        Summary       FlightNo.   Terminal     Company
2-18-2019       NY        Airplane Land      23          7         Delta 
2-18-2019     London     Cargo handling      4           5         British
2-18-2019      Dubai     Airplane land       92          7         Emirates
2-19-2019      Dubai     Airplane stay       92          5         Emirates
2-19-2019      Paris     Flight cancel       78          2         British
2-19-2019     London     Airplane Land       4           5         British
2-19-2019       LA       Airplane Land       7           2         United
2-20-2019      Dubai     Airplane land       92          3         Emirates
2-20-2019       LA       Airplane land       29          3         Delta
2-20-2019       NY       Airplane left       23          1         Delta
2-21-2019      Paris     Airplane reschedu   78          2         British
2-21-2019      London    Airplane land       4           3         British
2-21-2019       LA    Airplane from NY land  29          5         Delta
~~~
3-10-2019      London    Airplane land       5           5         KLM
3-10-2019      Paris     Airplane Land       78          7         AirFrance
3-10-2019       LA       Reschedule          29          4         United
3-11-2019       NY       Cargo handled       23          7         Delta
3-11-2019      Dubai     Arrived be4 2 day   34          7         Etihad
~~~
3-21-2019      Dubai      Airplane land      92          5         Emirates
3-21-2019     New Delhi   Reschedule         9           4         AirAsia
3-21-2019      London     Cargo handling     5           2         Lufthansa
3-22-2019     New Delhi   Airplane Land      9           3         AirAsia
3-22-2019       NY        Reschedule         23          2         United
3-22-2019      Dubai      Airplane land      35          1         Emirates

So the code should give us the last entry for plane landing where City == City and Flight No. == Flight No and Company == Company. As you can see this condition can be true for multiple days. So If all the three conditions are true and Summary contains Airplane Lands return the last true entire

Edited The desired output should look like the dataset below:

Date           City        Summary       FlightNo.   Terminal     Company
2-18-2019       NY       Airplane Land       23          7         Delta 
2-19-2019       LA       Airplane Land       7           2         United
2-20-2019      Dubai     Airplane land       92          3         Emirates
2-21-2019      London    Airplane Land       4           3         British
2-21-2019       LA    Airplane from NY land  29          5         Delta
~~~
3-10-2019      London    Airplane land       5           5         KLM
3-10-2019      Paris     Airplane Land       78          7         AirFrance
~~~
3-21-2019      Dubai      Airplane land      92          5         Emirates
3-22-2019     New Delhi   Airplane Land      9           3         AirAsia
3-22-2019      Dubai      Airplane land      35          1         Emirates

As shown above to delete row all three columns(City, FlightNo., and Company) should be the same if any of them is different then both rows should be kept.

The logic of it: Condition1: If df[Summary] contains "Airplane" and "land" return the row Condition2: Frome the already filtered dataset If df[City] == df[City] and df[FlightNo.] == df[FlightNo.] and df[Company] == df[Company] is true with 3 days then keep either the last or the first. So if returns rows with airplane land in the same city with same flight number runned by the same company on the 18th and 20th then one day row should be kept only. But if it was on the 1st and 15th from the same month then keep both rows.

Please help me find a what to apply all condition and keep the last True entrie.

EDIT:

Keep first if condition are true in the next 3 days Input

print (df)
     Date      City Code      Summary      Flight No.   Company
0   2-18-2019    021        Airplane land      23       Emirates
1   2-18-2019    013        Airplane land      23       Etihad
2   2-19-2019    021        Airplane land      23       Emirates
3   2-19-2019    013        Airplane Land      23       Etihad
4   2-20-2019    021        Airplane land      23       Emirates
5   2-20-2019    055        Airplane land      23       Emirates
6   2-20-2019    013        Airplane land      23       Etihad
7   2-21-2019    021        Airplane land      23       Emirates
8   2-21-2019    013        Airplane land      78       Emirates
9   2-21-2019    055  Airplane from NY land    23       Emirates
10  2-22-2019    021        Airplane land      78       Emirates
11  2-22-2019    013        Airplane Land      78       Emirates
12  2-22-2019    055        Airplane land      78       Emirates
13  2-23-2019    021        Airplane land      78       Etihad

Output:

print (df)
         Date      City Code      Summary      Flight No.   Company
    0   2-18-2019    021        Airplane land      23       Emirates
    1   2-18-2019    013        Airplane land      23       Etihad
    5   2-20-2019    055        Airplane land      23       Emirates
    7   2-21-2019    021        Airplane land      23       Emirates
    8   2-21-2019    013        Airplane land      78       Emirates
    10  2-22-2019    021        Airplane land      78       Emirates
    12  2-22-2019    055        Airplane land      78       Emirates

jezrael · Accepted Answer · 2021-12-28 08:14:54Z

2

I think you need:

#convert to datetimes
df['Date'] = pd.to_datetime(df['Date'])

#sortig by datetimes
df = df.sort_values(['City Code', 'Flight No.','Company','Date'])

#filter case non sensitive
df=df[(df.Summary.str.contains('Airplane ') & df.Summary.str.contains('Land', case=False))]

s = df.groupby(['City Code', 'Flight No.','Company'])['Date'].transform('first')
#get diff by first date per groups
df['diff'] = df['Date'].sub(s).dt.days.fillna(0)
#group column each 3 days
df['g'] = (df['diff'] // 3 )
#filter 3 days window from first per groups
df = df[~df.duplicated(['City Code', 'Flight No.','Company','g'])]
print (df)
         Date City Code        Summary  Flight No.   Company
0  2019-02-18       021  Airplane land          23  Emirates
1  2019-02-18       013  Airplane land          23    Etihad
5  2019-02-20       055  Airplane land          23  Emirates
7  2019-02-21       021  Airplane land          23  Emirates
8  2019-02-21       013  Airplane land          78  Emirates
10 2019-02-22       021  Airplane land          78  Emirates
12 2019-02-22       055  Airplane land          78  Emirates
13 2019-02-23       021  Airplane land          78    Etihad

edited Dec 28, 2021 at 8:14

answered Dec 16, 2021 at 6:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

43 Comments

SMO Over a year ago

Thank you for helping but I tried it and it filtered half of the dataset which what I do not want. If City == City and No. == No. and summary contains Airplane & Land then keep the last within two days. So if the condition is true on the 9th ,10th, 11th and 25th then it should return the rows in the 11th and 25th. So I only need to filter the repeated within 3 days if it is more than that then keep the row

jezrael Over a year ago

@SMO - can you test now?

SMO Over a year ago

No city code can appear only once every day

SMO Over a year ago

the difference can be more than one day. so city code can appear every day , every other day or even once a year

SMO Over a year ago

this one works. However the previous one that contains def f(x): return (x.diff().dt.days.fillna(0).cumsum() // 3).duplicated() worked also. and it made more sense to me. I'm really thankful for your time and effort.

|

Collectives™ on Stack Overflow

Python filter row with multiple columns conditions

1 Answer 1

43 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

43 Comments

Your Answer

Sign up or log in

Post as a guest

Related