1

Tried to do this with pandas.Series.apply function but it consider to be slow on big amount of data. Is there any quicker way to replace values?

Here is what I've tried, but it's slow on big Series (with million items for example)

s = pd.Series([1, 2, 3, 'str1', 'str2', 3])
s.apply(lambda x: x if type(x) == str else np.nan)
1
  • 1
    Can you post your solution? Commented Jan 28, 2021 at 10:02

1 Answer 1

2

Use to_numeric with errors='coerce':

pd.to_numeric(s, errors='coerce')

If need also integers add Int64:

pd.to_numeric(s, errors='coerce').astype('Int64')

EDIT: You can use isinstance with map, and also Series.where:

#test 600k
N = 100000
s = pd.Series([1, 2, 3, 'str1', 'str2', 3] * N)


In [152]: %timeit s.apply(lambda x: x if type(x) == str else np.nan)
196 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [153]: %timeit s.map(lambda x: x if isinstance(x, str) else np.nan)
174 ms ± 3.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [154]: %timeit s.where(s.map(lambda x: isinstance(x, str)))
168 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [155]: %timeit s.where(pd.to_numeric(s, errors='coerce').isna())
366 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, my bad, if string then dont change value:)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.