1

I have a string in which I need to add a '\' in front of every '[' or ']', except if the brackets enclose an x like this: '[x]'. In the other cases, the brackets will always enclose a number.

Example: 'Foo[123].bar[x]' should become 'Foo\[123\].bar[x]'.

What is the best way to achieve this? Thanks a lot on beforehand.

2
  • 2
    Ah, SO.. where you can get +4 for "give me the codez" these days. Commented Aug 16, 2012 at 23:11
  • I learned from the answers below. That's the idea, right? Thanks to all who helped me out. Picking the answer is difficult though. I'll give it to the regex way as I learned more from it. Commented Aug 17, 2012 at 8:08

3 Answers 3

8

Something like this ought to work:

>>> import re
>>>
>>> re.sub(r'\[(\d+)\]', r'\[\1\]', 'Foo[123].bar[x]')
'Foo\\[123\\].bar[x]'
Sign up to request clarification or add additional context in comments.

3 Comments

This is a good solution (just compile the regex beforehand). It might happen that it be slightly slower than three chained replaces, because a regex engine implies some overhead. But in any case, this solution is easier to understand and would be easier to modify if the requirements change in the future.
I'm a regex newbie, but with a little help from the web I could figure out why this also works. Thanks, it's worth to learn a bit more of regex I see.
@saroele I recommend this tutorial.
6

You can do it without reaching for regexs like this:

s.replace('[', '\[').replace(']', '\]').replace('\[x\]', '[x]')

7 Comments

Three full scans of the result string to achieve the desired output? For long strings this will perform poorly.
Sure, and in that case I'd advocate the regex approach. I was presenting (arguably) a conceptually simpler approach for when this isn't an issue. If performance is an issue, a single linear scan (with one lookahead) would outperform regex anyway.
On the test string 'Foo[123].bar[x]zzzzz'*100000, 2M characters long, I find that re.sub takes 250 ms (even making sure the pattern is compiled) whereas the replace only takes 36 ms.
@g.d.d.c: It's O(n). The regex is probably the best tool for the job, but this is acceptable.
The strings I need to convert are maximally 100 characters long. So it seems like this solutions will be the fastest
|
0

A different approach, just put a slash before [] only if they aren't followed by x] or preceded by [x .

result = re.sub(r"(\[(?!x\])|(?<!\[x)\])", r"\\\1", subject)

Explanation:

# (\[(?!x\])|(?<!\[x)\])
# 
# Match the regular expression below and capture its match into backreference number 1 «(\[(?!x\])|(?<!\[x)\])»
# Match either the regular expression below (attempting the next alternative only if this one fails) «\[(?!x\])»
# Match the character “[” literally «\[»
# Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!x\])»
# Match the character “x” literally «x»
# Match the character “]” literally «\]»
# Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?<!\[x)\]»
# Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\[x)»
# Match the character “[” literally «\[»
# Match the character “x” literally «x»
# Match the character “]” literally «\]»

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.