I have a large pandas dataframe df1 that contains whole user agents in col1 and the contained Chrome version in col2 (col2 is generated based on regex patterns applied to col1).
col1, col2
Mozilla/5.0 (X11; Linux x86_64) Chrome/14.0.2785.89 Safari/537.36, Chrome/14
Mozilla/5.0 (X11; Linux x86_64) Chrome/15.0.2743.98 Safari/537.36, Chrome/15
Mozilla/5.0 (X11; Linux x86_64) Chrome/22 Safari/537.36, None
I want to replace the Chrome version number in col1 with a random integer above a threshold if the same number in col2 is below this threshold. Note that col2 is None if the threshold is met.
I know that in this context, I need to df.apply with axis = 1 in order to access both column values at the same time.
However, when I do:
df1.loc[(df1.col2 is not None), 'col1'] =
df1.apply(lambda x: x['col1'].replace(x['col2'], randint(20, 60)), axis=1)
I yield:
TypeError: ('expected a string or other character buffer object', u'occurred at index 0')
How to replace variable substrings across a pandas dataframe column that are defined by the column after?
Solutions that did not work for me (reason):
Python Pandas removing substring using another column (too slow)
replace substring in pandas data frame column (not applicable to variable substrings)