How to replace variable substrings across a pandas dataframe column that are defined by the column after?

Question

I have a large pandas dataframe df1 that contains whole user agents in col1 and the contained Chrome version in col2 (col2 is generated based on regex patterns applied to col1).

col1, col2
Mozilla/5.0 (X11; Linux x86_64) Chrome/14.0.2785.89 Safari/537.36, Chrome/14
Mozilla/5.0 (X11; Linux x86_64) Chrome/15.0.2743.98 Safari/537.36, Chrome/15
Mozilla/5.0 (X11; Linux x86_64) Chrome/22 Safari/537.36, None

I want to replace the Chrome version number in col1 with a random integer above a threshold if the same number in col2 is below this threshold. Note that col2 is None if the threshold is met.

I know that in this context, I need to df.apply with axis = 1 in order to access both column values at the same time.

However, when I do:

df1.loc[(df1.col2 is not None), 'col1'] = 
         df1.apply(lambda x: x['col1'].replace(x['col2'], randint(20, 60)), axis=1)

I yield:

TypeError: ('expected a string or other character buffer object', u'occurred at index 0')

How to replace variable substrings across a pandas dataframe column that are defined by the column after?

Solutions that did not work for me (reason):
Python Pandas removing substring using another column (too slow)
replace substring in pandas data frame column (not applicable to variable substrings)

cs95 · Accepted Answer · 2018-03-13 05:19:44Z

2

There's absolutely no need for apply. Use str.replace with a callback:

from random import randint

m = df.col2.notna()
df.loc[m, 'col1'] = df.loc[m, 'col1'].str.replace(
     r'(?<=Chrome/).*?(?=\s)', lambda x: str(randint(20, 60))
)

df
                                                col1       col2
0  Mozilla/5.0 (X11; Linux x86_64) Chrome/51 Safa...  Chrome/14
1  Mozilla/5.0 (X11; Linux x86_64) Chrome/26 Safa...  Chrome/15
2  Mozilla/5.0 (X11; Linux x86_64) Chrome/22 Safa...       None

edited Mar 13, 2018 at 5:19

answered Mar 13, 2018 at 5:11

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to replace variable substrings across a pandas dataframe column that are defined by the column after?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related