1

I have rules dataframe which has several columns:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'b'      'g'    '456'   '123 school'
'a'      'r'    '123'   '456 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to filter out all rows where 's1' is not a substring from 's2'. The result would be:

col1 ... colx   s1       s2  
'a'      'g'    '123'   '123 school'
'd'      'g'    '456'   '456 school'
'a'      'g'    '123'   '123 school'

I need to do that in the fast way possible so I tried:

rules = rules[rules['s1'] in rules['s2']]

but it does not seem to work

2 Answers 2

5

I'd use a comprehension and a boolean mask

df[[x in y for x, y in zip(df.s1, df.s2)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school

You can also use operator.contains and map

from operator import contains

df[[*map(contains, df.s2, df.s1)]]

  col1 colx   s1          s2
0    a    g  123  123 school
3    d    g  456  456 school
4    a    g  123  123 school
Sign up to request clarification or add additional context in comments.

Comments

1
# compare columns row wise
boolean_selector = df.apply(lambda x: x['s1'] in x['s2'], axis=1)

# copy of the data-frame 
new_df = df[boolen_selector]

# view of the data-frame 
true_group = df.loc[boolean_selector]

As a one-liner;

df = df[df.apply(lambda x: x['s1'] in x['s2'], axis=1)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.