0

I have the following string:

ref:_00D30jPy._50038vQl5C:ref

And would like to formalize the following output string:

5003800000vQl5C

The required regex actions are:

  • Remove all leading characters until the digit '5'.
  • Add 5 zeros starting the fifth digit.
  • Remove the closing ':ref'.

I initially made the following regex to match the whole string: (ref:(\S+):ref)

How can I alter the Python RegEx to achieve the above?

2 Answers 2

2

Use re.sub:

import re
s = 'ref:_00D30jPy._50038vQl5C:ref'
result = re.sub(r'^[^5]*(5.{4})(.*?):ref$', r'\g<1>00000\g<2>', s, 0, re.MULTILINE)
print(result)

Output:

5003800000vQl5C

Explanation:

  • ^[^5]*: match characters except 5 from the beginning
  • (5.{4}): capture the first 5 characters to group 1
  • (.*?):ref$: capture the remaining to group 2 except the :ref at the end
  • \g<1>00000\g<2>: replace the whole line with \g<1>00000\g<2> where \g<1> and \g<2> are substituted by group 1 and 2 repsectively.

Demo has a Python 2-compatible code generator and detailed explanation.

Sign up to request clarification or add additional context in comments.

Comments

1

regex is not required for this task. It can be achieved more simply using string slicing.

If the input strings maintain the same format and lengths you can simply do this:

s = 'ref:_00D30jPy._50038vQl5C:ref'
new = '{}00000{}'.format(s[15:20], s[20:-4])

If there is some variability then search for the first '5' in the string and slice from there:

start = s.index('5')
new = '{}00000{}'.format(s[start:start+5], s[start+5:-4])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.