2

I'm trying to split a sequence of non-regular strings that I read in a python program from excel files. I am using Regex101.com for testing and I partially have succeeded in doing it:
My sample:

Barber #1-1 Daily Prod. - Pumping unit  
Barbee #1-3 Daily Prod. - Plunger Lift  
Barbee #1-5 Daily Prod. = Coil Tubing  
Barbee #1-3 Daily Prod. - Plunger  
Barbee #1-5 Daily Prod.w/ coil tubing  
Porter GU #1 Well #2 Daily Prod.  
Barber GU #1 Well #1 Daily Prod.  
Bogel #1-2 Daily Prod. w/ plunger  

My regex:
(.*)\sDaily Prod\.(.*$)

I am getting this answer select group1 and group2:

Barber #1-1 - Pumping unit  
Barbee #1-3 - Plunger Lift  
Barbee #1-5 = Coil Tubing  
Barbee #1-3 - Plunger  
Barbee #1-5w/ coil tubing  
Porter GU #1 Well #2  
Barber GU #1 Well #1  
Bogel #1-2 w/ plunger  

and I would like to have:

Barber #1-1 Pumping unit  
Barbee #1-3 Plunger Lift  
Barbee #1-5 Coil Tubing  
Barbee #1-3 Plunger  
Barbee #1-5 coil tubing  
Porter GU #1 Well #2  
Barber GU #1 Well #1  
Bogel #1-2 plunger  

Thanks.

2 Answers 2

1

My guess is that this expression might likely work:

(.*)\sDaily Prod\.(\s*[-=w\/]+\s*)?(.*)

Here, we have an optional group:

(\s*[-=w\/]+\s*)?

which we collect our undesired chars and spaces, then we make a replacement with $1 and $3.

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(.*)\sDaily Prod\.(\s*[-=w\/]+\s*)?(.*)"

test_str = ("Barber #1-1 Daily Prod. - Pumping unit\n"
    "Barbee #1-3 Daily Prod. - Plunger Lift\n"
    "Barbee #1-5 Daily Prod. = Coil Tubing\n"
    "Barbee #1-3 Daily Prod. - Plunger\n"
    "Barbee #1-5 Daily Prod.w/ coil tubing\n"
    "Porter GU #1 Well #2 Daily Prod.\n"
    "Barber GU #1 Well #1 Daily Prod.\n"
    "Bogel #1-2 Daily Prod. w/ plunger")

subst = "\\1 \\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Hi @Emma. Yes. That works perfectly. This regex stuff makes me blind sometimes. Thanks. And thanks to GSD for editing the question.
0

You could also match what you want to remove and replace with an empty string:

\sDaily Prod\.(?:\s*(?:[-=]|w/))?

Explanation

  • \sDaily Prod\. Match whitespace char, Daily Prod and dot
  • (?: Non capturing group
    • \s* Match 0+ whitespace chars
    • (?: Non capturing group
      • [-=] Match - or =
      • | Or
      • w/ Match literally
    • ) Close non capturing group
  • )? Close non capturing group and make it optional

Regex demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.