Trouble with using substring to select data frame rows

Question

Python 2.7

In [3]:import pandas as pd
df = pd.DataFrame(dict(A=['abc','abc','abc','xyz','xyz'],
                       B='abcdef','abcdefghi','notthisone','uvwxyz','orthisone']))
In [4]: df
Out[4]:
    A   B
0   abc abcdef
1   abc abcdefghi
2   abc notthisone
3   xyz uvwxyz
4   xyz orthisone

In [12]:  df[df.B.str.contains(df.A) == True] 
# just keep the B that contain A string

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I am trying for this:

    A   B
0   abc abcdef
1   abc abcdefghi
3   xyz uvwxyz

I have tried variations of the str.contains statement, but no go. Any help is much appreciated.

Marius · Accepted Answer · 2015-06-05 03:01:26Z

2

It doesn't look like str.contains supports multiple patterns, so you may just have to apply over the rows:

substr_matches = df.apply(lambda row: row['B'].find(row['A']) > -1, axis=1)

df.loc[substr_matches]
Out[11]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

answered Jun 5, 2015 at 3:01

Marius

60.6k16 gold badges115 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander · Accepted Answer · 2015-06-05 06:03:39Z

2

Apply a lambda function on the rows and test if A is in B.

>>> df[df.apply(lambda x: x.A in x.B, axis=1)]
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

answered Jun 5, 2015 at 6:03

Alexander

111k32 gold badges212 silver badges208 bronze badges

Comments

EdChum · Accepted Answer · 2015-06-05 07:54:40Z

1

You can call unique on column 'A' and then join with | to create a pattern for matching using contains:

In [15]:
df[df['B'].str.contains('|'.join(df['A'].unique()))]

Out[15]:
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

answered Jun 5, 2015 at 7:54

EdChum

397k204 gold badges836 silver badges583 bronze badges

Comments

fixxxer · Accepted Answer · 2015-06-05 09:46:20Z

1

How about this ?

In [8]: df[df.apply(lambda v: v['A'] in v['B'], axis=1)]
Out[8]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

answered Jun 5, 2015 at 9:46

fixxxer

16.2k15 gold badges64 silver badges78 bronze badges

Collectives™ on Stack Overflow

Trouble with using substring to select data frame rows

Python 2.7

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Python 2.7

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related