3

In a dataframe

df = pd.DataFrame({'c1': ['c10:b', 'c11', 'c12:k'], 'c2': ['c20', 'c21', 'c22']})

     c1    c2
0   c10:b  c20
1   c11    c21
2   c12:k  c22

I'd like to modify the string values of column c1 so that everything after (and including) the colon removes, so it ends up like this:

     c1    c2
0   c10    c20
1   c11    c21
2   c12    c22

I've tried slicing

df[’c1’].str[:df[’c1’].str.find(’:’)]

but it doesn't work. How do I accomplish this?

1 Answer 1

5

Using replace with regex=True:

df.replace(r'\:.*', '', regex=True)

    c1   c2
0  c10  c20
1  c11  c21
2  c12  c22

To only replace this pattern in a single column, use the str accessor:

df.c1.str.replace(r'\:.*', '')

If performance is a concern, use a list comprehension and partition instead of pandas string methods:

[i.partition(':')[0] for i in df.c1]
# ['c10', 'c11', 'c12']

Timings

df = pd.concat([df]*10000)

%timeit df.replace(r'\:.*', '', regex=True)
30.8 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.c1.str.replace(r'\:.*', '')
31.2 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['c1'].str.partition(':')[0]
56.7 ms ± 269 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [i.partition(':')[0] for i in df.c1]
4.2 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.