0

I have a Pandas dataframe:

id         attr
1          val1
2          val1||val2
3          val1||val3
4          val3

and a list special_val = ['val1', 'val2', 'val4']

I want to filter the first dataframe to keep rows whose ALL attr values are in the list. So I need the results to be like this:

id     attr
1      val1                #val1 is in special_val
2      val1||val2          #both val1 and val2 are in special_val 

I am thinking of using pandas.DataFrame.isin or pandas.Series.isin but I can't come up with the correct syntax. Could you help?

3 Answers 3

2

You can combine str.split, isin(), and groupby():

s = df['attr'].str.split('\|+', expand=True).stack().isin(special_val).groupby(level=0).all()
df[s]

Output:

   id        attr
0   1        val1
1   2  val1||val2
Sign up to request clarification or add additional context in comments.

Comments

1

You can try the following.

df['match'] = df['attr'].apply(lambda x: True if set(x.split('||')).intersection(set(special_val)) else False)
df[df['match'] == True]

Output

   id        attr
0   1        val1
1   2  val1||val2

Comments

0

You can do:

import numpy as np
special_val = set(['val1', 'val2', 'val4'])

df["attr2"]=df["attr"].str.split("\|\|").map(set)
df=df.loc[df["attr2"].eq(np.bitwise_and(df["attr2"], special_val))].drop(columns="attr2")

Outputs:

   id        attr
0   1        val1
1   2  val1||val2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.