Replace column values using regex in pandas data frame

Question

I have a column in pandas data frame like below. Column name is ABC

ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube

Now I want to replace the values in this column using regex like below

ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube

How can we do this type of string matching in pandas data frame.

MaxU - stand with Ukraine · Accepted Answer · 2017-10-30 22:06:10Z

8

In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

edited Oct 30, 2017 at 22:06

answered Oct 30, 2017 at 21:29

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

piRSquared Over a year ago

df.ABC.str.split('_\d', 1).str[0]

User12345 Over a year ago

@MaxU Good trick just a small doubt if my column has Fuel_aa_12 will this work.

MaxU - stand with Ukraine Over a year ago

@piRSquared, please add it as an answer! :)

piRSquared Over a year ago

I added a different one.

ctwheels Over a year ago

@New_learner you should add more possible inputs to your question if that's the case. 2/3 answers here break due to the new input you've just described.

|

piRSquared · Accepted Answer · 2017-10-30 21:41:59Z

5

Use positive lookbehind for lube or fuel while ignoring case.

import re
import pandas as pd

pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)

df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])

        ABC
0      Fuel
1      FUEL
2      Fuel
3      Fuel
4      Lube
5      Lube
6      Lube
7  cat_Lube

answered Oct 30, 2017 at 21:41

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

ctwheels Over a year ago

_\d uses 19 steps instead of 51 steps

piRSquared Over a year ago

@ctwheels so is this even better?: re.compile('(?<=lube|fuel)_.*', re.IGNORECASE)

ctwheels Over a year ago

Ya stick with the one you've just commented, OP added new input in a comment below MaxU's answer which causes the other answers to break on \d use.

cs95 · Accepted Answer · 2017-10-31 03:30:07Z

5

Alt with str.extract:

df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Extension courtesy piRSquared:

df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

edited Oct 31, 2017 at 3:30

answered Oct 30, 2017 at 21:32

cs95

406k106 gold badges744 silver badges797 bronze badges

3 Comments

piRSquared Over a year ago

pd.Series.str.extract version df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)

ctwheels Over a year ago

Adding ^ at the beginning of the query halves the number of steps (180 instead of 363)

cs95 Over a year ago

@ctwheels Thanks, that's interesting to know (added).

Collectives™ on Stack Overflow

Replace column values using regex in pandas data frame

3 Answers 3

9 Comments

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related