Regex to split a string in Python 3.7

Question

I'm trying to split a sequence of non-regular strings that I read in a python program from excel files. I am using Regex101.com for testing and I partially have succeeded in doing it:
My sample:

Barber #1-1 Daily Prod. - Pumping unit  
Barbee #1-3 Daily Prod. - Plunger Lift  
Barbee #1-5 Daily Prod. = Coil Tubing  
Barbee #1-3 Daily Prod. - Plunger  
Barbee #1-5 Daily Prod.w/ coil tubing  
Porter GU #1 Well #2 Daily Prod.  
Barber GU #1 Well #1 Daily Prod.  
Bogel #1-2 Daily Prod. w/ plunger

My regex:
(.*)\sDaily Prod\.(.*$)

I am getting this answer select group1 and group2:

Barber #1-1 - Pumping unit  
Barbee #1-3 - Plunger Lift  
Barbee #1-5 = Coil Tubing  
Barbee #1-3 - Plunger  
Barbee #1-5w/ coil tubing  
Porter GU #1 Well #2  
Barber GU #1 Well #1  
Bogel #1-2 w/ plunger

and I would like to have:

Barber #1-1 Pumping unit  
Barbee #1-3 Plunger Lift  
Barbee #1-5 Coil Tubing  
Barbee #1-3 Plunger  
Barbee #1-5 coil tubing  
Porter GU #1 Well #2  
Barber GU #1 Well #1  
Bogel #1-2 plunger

Thanks.

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

My guess is that this expression might likely work:

(.*)\sDaily Prod\.(\s*[-=w\/]+\s*)?(.*)

Here, we have an optional group:

(\s*[-=w\/]+\s*)?

which we collect our undesired chars and spaces, then we make a replacement with $1 and $3.

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(.*)\sDaily Prod\.(\s*[-=w\/]+\s*)?(.*)"

test_str = ("Barber #1-1 Daily Prod. - Pumping unit\n"
    "Barbee #1-3 Daily Prod. - Plunger Lift\n"
    "Barbee #1-5 Daily Prod. = Coil Tubing\n"
    "Barbee #1-3 Daily Prod. - Plunger\n"
    "Barbee #1-5 Daily Prod.w/ coil tubing\n"
    "Porter GU #1 Well #2 Daily Prod.\n"
    "Barber GU #1 Well #1 Daily Prod.\n"
    "Bogel #1-2 Daily Prod. w/ plunger")

subst = "\\1 \\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx Circuit

jex.im visualizes regular expressions:

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jun 12, 2019 at 23:32

Emma Marcier

27.8k12 gold badges49 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user_dhrn Over a year ago

Hi @Emma. Yes. That works perfectly. This regex stuff makes me blind sometimes. Thanks. And thanks to GSD for editing the question.

The fourth bird · Accepted Answer · 2019-06-13 09:08:31Z

0

You could also match what you want to remove and replace with an empty string:

\sDaily Prod\.(?:\s*(?:[-=]|w/))?

Explanation

\sDaily Prod\. Match whitespace char, Daily Prod and dot
(?: Non capturing group
- \s* Match 0+ whitespace chars
- (?: Non capturing group
  - [-=] Match - or =
  - | Or
  - w/ Match literally
- ) Close non capturing group
)? Close non capturing group and make it optional

Regex demo

answered Jun 13, 2019 at 9:08

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Collectives™ on Stack Overflow

Regex to split a string in Python 3.7

2 Answers 2

Demo

Test

RegEx Circuit

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Test

RegEx Circuit

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related