3

I have a dataframe that consists of multiple columns. I want to select rows based on conditions in multiple columns. Assuming that I have four columns in a dataframe:

import pandas as pd
di={"A":[1,2,3,4,5],
    "B":['Tokyo','Madrid','Professor','helsinki','Tokyo Oliveira'],
"C":['250','200//250','250//250//200','12','200//300'],
"D":['Left','Right','Left','Right','Right']}
data=pd.DataFrame(di)

I want to select Tokyo in column B, 200 in column C, Left in column D. By that, the first row will be only selected. I have to create a function to handle column C. Since I need to check the first value if the row contains a list with //

To handle this, I assume this can be done through the following:

def check_200(thecolumn):
thelist=[]
for i in thecolumn:
    f=i
    if "//" in f:
        #split based on //
        z=f.split("//")
        f=z[0]

    f=float(f)
    if f > 200.00:
        thelist.append(True)
    else:
        thelist.append(False)
return thelist

Then, I will create the multiple conditions:

selecteddata=data[(data.B.str.contains("Tokyo")) & 
(data.D.str.contains("Left"))&(check_200(data.C))]

Is this the best way to do that, or there is an easier pandas function that can handle such requirements ?

1
  • 2
    whats your target output? Commented Mar 30, 2020 at 12:09

2 Answers 2

2

I don't think there is a most pythonic way to do this, but I think this is what you want:

bool_idx = ((data.B.str.contains("Tokyo")) & 
(data.D.str.contains("Left")) & (data.C.str.contains("//")
& (data.C.str.split("//")[0].astype(float)>200.00))

selecteddata=data[bool_idx]
Sign up to request clarification or add additional context in comments.

1 Comment

If by the best way you mean without boolean indexing, I really think this is the best way because you have to "represent" how you want to slice the dataframe in some way and thus this looks like the most "compressed" way to show this.
0

Bruno's answer does the job, and I agree that boolean masking is the way to go. This answer keeps the code a little closer to the requested format.


import numpy as np

def col_condition(col):
    col = col.apply(lambda x: float(x.split('//')[0]) > 200)
    return col

data = data[(data.B.str.contains('Tokyo')) & (data.D.str.contains("Left")) &
             col_condition(data.C)]

The function reads in a Series, and converts each element to True or False, depending on the condition. It then returns this mask.

1 Comment

You also have to check whether '//' is in x because if you have something like '300' when you use split('//') it will return the '300' and thus return true.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.