2

I'm trying to write a regex to replace strings if not surrounded by single quotes. For example I want to replace FOO with XXX in the following string:

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

the desired output is:

output = "' FOO ' abc 123 ' def FOO ghi 345 ' XXX '' XXX ' lmno 678 FOO '"

My current regex is:

myregex = re.compile("(?<!')+( FOO )(?!')+", re.IGNORECASE)

I think I have to use look-around operators, but I don't understand how... regex are too complicated to me :D

Can you help me?

7
  • I think your example is broken. Why isn't the first " abc 123 " not replaced with XXX? Commented Aug 3, 2012 at 8:45
  • The example looks right to me, the first FOO is surrounded by single quotes and must be skipped Commented Aug 3, 2012 at 8:47
  • Agree with you on the first FOO. However, doesn't that mean that the bit starting abc is /outside/? If so, the result should be: "' FOO ' XXX ' def FOO ghi 345 ' XXX '' XXX ' lmno 678 FOO '". Correct? Commented Aug 3, 2012 at 8:52
  • The example seems broken to me as well. Commented Aug 3, 2012 at 8:55
  • no, only the literal "FOO" (with one space before and one after) should be replaced with "XXX" :P Commented Aug 3, 2012 at 8:57

2 Answers 2

3

Here's how it could be done:

import re

def replace_FOO(m):
    if m.group(1) is None:
        return m.group()

    return m.group().replace("FOO", "XXX")

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

output = re.sub(r"'[^']*'|([^']*)", replace_FOO, string)

print(string)
print(output)

[EDIT]

The re.sub function will accept as a replacement either a string template or a function. If the replacement is a function, every time it finds a match it'll call the function, passing the match object, and then use the returned value (which must be a string) as the replacement string.

As for the pattern itself, as it searches, if there's a ' at the current position it'll match up to and including the next ', otherwise it'll match up to but excluding the next ' or the end of the string.

The replacement function will be called on each match and return the appropriate result.

Actually, now I think about it, I don't need to use a group at all. I could do this instead:

def replace_FOO(m):
    if m.group().startswith("'"):
        return m.group().replace("FOO", "XXX")

    return m.group()

string = "' FOO ' abc 123 ' def FOO ghi 345 ' FOO '' FOO ' lmno 678 FOO '"

output = re.sub(r"'[^']*'|[^']+", replace_FOO, string)
Sign up to request clarification or add additional context in comments.

2 Comments

Does not work for me, I get ' FOO '' def FOO ghi 345 '''' lmno 678 FOO ' as the output (the "XXX" are gone)
It works as expected for me (Python 2.7.1) thanks a lot! It would be very useful if you could explain the code, since I'm a Python and regex newbie :P
2

This is hard to do without variable length lookbehind. I'm not sure if python regex support it. Anyway, a simple solution is the following:

Use this regex: (?:[^'\s]\s*)(FOO)(?:\s*[^'\s])

The first capture group should return the right result.

In case this is always a quote with a single space after it, as in your example, you can use fixed length lookbehind: (?<=[^'\s]\ )FOO(?=\s*[^'\s]) which will match exactly the one you want.

1 Comment

Python's standard regex library 're' doesn't support variable-length lookbehinds, but there is an alternative regex library on PyPI which does at pypi.python.org/pypi/regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.