2

I need to replace all occurrences of "W32 L30" with "W32in L30in" in a large corpus of text. The numbers after W, L also vary.

I thought of using this regex expressions

[W]([-+]?\d*\.\d+|\d+)
[L]([-+]?\d*\.\d+|\d+)

But these would only find the number after each W and L, so it's still laborious and very time consuming to replace every occurrence so I was wondering if there's a way to do this directly in regex.

2
  • The numbers can be floats as well? Commented Jan 19, 2018 at 10:05
  • yes, they can be floats as well Commented Jan 19, 2018 at 10:06

1 Answer 1

2

You can use a capture group and simplify the regex. Next we can then use a backref to do the replacement. Like:

import re

RGX = re.compile(r'([WL]([-+]?\d*\.\d+|\d+))(in)?')
result = RGX.sub(r'\1in', some_string)

The \1 is used to reference the first capture group: the result of the string we capture with [WL]([-+]?\d*\.\d+|\d+). The last part (in)? optionally also matches the word in, such that in case there is already an in, we simply replace it with the same value.

So if some_string is for instance:

>>> some_string
'A W2 in C3.15 where L2.4in and a bit A4'
>>> RGX.sub(r'\1in', some_string)
'A W2in in C3.15 where L2.4in and a bit A4'
Sign up to request clarification or add additional context in comments.

3 Comments

thanks this is great, but what happens if I use some_string="W32in L45"? I would get "W32inin L45in", but instead I would like "W32in L45in"
@Brian: only if it ends with in? What happens if it would be W32out?
only if it ends with in

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.