7

I have a column in pandas data frame like below. Column name is ABC

ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube

Now I want to replace the values in this column using regex like below

ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube

How can we do this type of string matching in pandas data frame.

3 Answers 3

8
In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object
Sign up to request clarification or add additional context in comments.

9 Comments

df.ABC.str.split('_\d', 1).str[0]
@MaxU Good trick just a small doubt if my column has Fuel_aa_12 will this work.
@piRSquared, please add it as an answer! :)
I added a different one.
@New_learner you should add more possible inputs to your question if that's the case. 2/3 answers here break due to the new input you've just described.
|
5

Use positive lookbehind for lube or fuel while ignoring case.

import re
import pandas as pd

pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)

df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])

        ABC
0      Fuel
1      FUEL
2      Fuel
3      Fuel
4      Lube
5      Lube
6      Lube
7  cat_Lube

3 Comments

_\d uses 19 steps instead of 51 steps
@ctwheels so is this even better?: re.compile('(?<=lube|fuel)_.*', re.IGNORECASE)
Ya stick with the one you've just commented, OP added new input in a comment below MaxU's answer which causes the other answers to break on \d use.
5

Alt with str.extract:

df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Extension courtesy piRSquared:

df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

3 Comments

pd.Series.str.extract version df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)
Adding ^ at the beginning of the query halves the number of steps (180 instead of 363)
@ctwheels Thanks, that's interesting to know (added).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.