Modify string values of a pandas dataframe column

Question

In a dataframe

df = pd.DataFrame({'c1': ['c10:b', 'c11', 'c12:k'], 'c2': ['c20', 'c21', 'c22']})

     c1    c2
0   c10:b  c20
1   c11    c21
2   c12:k  c22

I'd like to modify the string values of column c1 so that everything after (and including) the colon removes, so it ends up like this:

     c1    c2
0   c10    c20
1   c11    c21
2   c12    c22

I've tried slicing

df[’c1’].str[:df[’c1’].str.find(’:’)]

but it doesn't work. How do I accomplish this?

user3483203 · Accepted Answer · 2018-08-21 21:19:51Z

5

Using replace with regex=True:

df.replace(r'\:.*', '', regex=True)

    c1   c2
0  c10  c20
1  c11  c21
2  c12  c22

To only replace this pattern in a single column, use the str accessor:

df.c1.str.replace(r'\:.*', '')

If performance is a concern, use a list comprehension and partition instead of pandas string methods:

[i.partition(':')[0] for i in df.c1]
# ['c10', 'c11', 'c12']

Timings

df = pd.concat([df]*10000)

%timeit df.replace(r'\:.*', '', regex=True)
30.8 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.c1.str.replace(r'\:.*', '')
31.2 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['c1'].str.partition(':')[0]
56.7 ms ± 269 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [i.partition(':')[0] for i in df.c1]
4.2 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Aug 21, 2018 at 21:19

answered Aug 21, 2018 at 20:57

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Modify string values of a pandas dataframe column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related