4

Python 2.7

In [3]:import pandas as pd
df = pd.DataFrame(dict(A=['abc','abc','abc','xyz','xyz'],
                       B='abcdef','abcdefghi','notthisone','uvwxyz','orthisone']))
In [4]: df
Out[4]:
    A   B
0   abc abcdef
1   abc abcdefghi
2   abc notthisone
3   xyz uvwxyz
4   xyz orthisone

In [12]:  df[df.B.str.contains(df.A) == True] 
# just keep the B that contain A string

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I am trying for this:

    A   B
0   abc abcdef
1   abc abcdefghi
3   xyz uvwxyz

I have tried variations of the str.contains statement, but no go. Any help is much appreciated.

4 Answers 4

2

It doesn't look like str.contains supports multiple patterns, so you may just have to apply over the rows:

substr_matches = df.apply(lambda row: row['B'].find(row['A']) > -1, axis=1)

df.loc[substr_matches]
Out[11]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz
Sign up to request clarification or add additional context in comments.

Comments

2

Apply a lambda function on the rows and test if A is in B.

>>> df[df.apply(lambda x: x.A in x.B, axis=1)]
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Comments

1

You can call unique on column 'A' and then join with | to create a pattern for matching using contains:

In [15]:
df[df['B'].str.contains('|'.join(df['A'].unique()))]

Out[15]:
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Comments

1

How about this ?

In [8]: df[df.apply(lambda v: v['A'] in v['B'], axis=1)]
Out[8]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.