Replace string with substring in DataFrame Column

Question

I'm trying to match a column in a DataFrame to one of a list of substrings.

e.g. take the column (strings) with the following values:

text1C1
text2A
text2
text4
text4B
text4A3

And create a new column which has matched them to the following substrings:

vals = ['text1', 'text2', 'text3', 'text4', 'text4B']

The code I have at the moment works, but it seems like a really inefficient way of solving the problem.

df = pd.DataFrame({'strings': ['text1C1', 'text2A', 'text2', 'text4', 'text4B', 'text4A3']})


for v in vals:
        df.loc[df[df['strings'].str.contains(v)].index, 'matched strings'] = v

This returns the following DataFrame, which is what I need.

   strings    matched strings
0  text1C1              text1
1   text2A              text2
2    text2              text2
3    text4              text4
4   text4B             text4B
5  text4A3              text4

Is there a more efficient way of doing this especially for larger DataFrames (10k+ rows)?

I cant think of how to deal with one of the items of vals also being a substring of another (text4 is a substring of text4B)

jezrael · Accepted Answer · 2019-05-10 10:50:52Z

2

Use generator with next for match first value:

s = vals[::-1]
df['matched strings1'] = df['strings'].apply(lambda x: next(y for y in s if y in x))
print (df)
   strings matched strings matched strings1
0  text1C1           text1            text1
1   text2A           text2            text2
2    text2           text2            text2
3    text4           text4            text4
4   text4B          text4B           text4B
5  text4A3           text4            text4

More general solution if possible no matched values with iter and default parameter of next:

f = lambda x: next(iter(y for y in s if y in x), 'no match')
df['matched strings1'] = df['strings'].apply(f)

Your solution should be improved:

for v in vals:
    df.loc[df['strings'].str.contains(v, regex=False), 'matched strings'] = v

edited May 10, 2019 at 10:50

answered May 10, 2019 at 10:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace string with substring in DataFrame Column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related