surrounding a pattern in python string with brackets

Question

I have a string looks like this:

oldString="this is my {{string-d}}" => "this is my {{(string-d)}}"
oldString2="this is my second{{ new_string-d }}" => "this is my second{{ (new_string-d) }}"
oldString2="this is my second new_string-d " => "this is my second (new_string-d) "
oldString2="this is my second new[123string]-d " => "this is my second (new[123string]-d) "

I want to add brackets whenever I see "-d" right after it and before the word that is attached to it.

I wrote a code that looks for the pattern "-d" in strings and partition the string after finding the pattern to 3 partitions before "-d", after "-d" and "-d" itself then I check the block before "-d" until I find whitespace or "{" and stop and add brackets. my code looks like this: P.S. I have many files that I read from them and try to modify the string there the example above is just for demonstrating what I'm trying to do.

   if ('-d') in oldString:
    p = oldString.partition('-d')
    v = p[p.index('-d')-1]
    beforeString=''
    for i in reversed(v):
        if i != ' ' or i != '{':
            beforeString=i+beforeString 
            indexNew = v.index(i)
    outPutLine = v[:indexNew]+'('+v[indexNew:]
    newString = outPutLine + '-d' + ' )'
    print newString

the result of running the code will be:

newString = "(this is my {{string-d )"

as you can see that the starting bracket is before "this" instead of before "string" why is this happening? also, I'm not sure if this is the best way to do this kind of find and replace any suggestions would be much appreciated.

Kevin · Accepted Answer · 2017-03-30 15:26:20Z

>>> import re
>>> oldString = "this is my {{string-d}}"
>>> oldString2 = "this is my second{{ new_string-d }}"
>>> re.sub(r"(\w*-d)", r"(\1)", oldString)
'this is my {{(string-d)}}'
>>> re.sub(r"(\w*-d)", r"(\1)", oldString2)
'this is my second{{ (new_string-d) }}'

Note that this matches "words" assuming that a word is composed of only letters, numbers, and underscores.

Here's a more thorough breakdown of what's happening:

An r before a string literal means the string is a "raw string". It prevents Python from interpreting characters as an escape sequence. For instance, r"\n" is a slash followed by the letter n, rather than being interpreted as a single newline character. I like to use raw strings for my regex patterns, even though it's not always necessary.
the parentheses surrounding \w*-d is a capturing group. It indicates to the regex engine that the contents of the group should be saved for later use.
the sequence \w means "any alphanumeric character or underscore".
* means "zero or more of the preceding item". \w* together means "zero or more alphanumeric characters or underscores".
-d means "a hyphen followed by the letter d.

All together, (\w*-d) means "zero or more alphanumeric characters or underscores, followed by a hyphen and the letter d. Save all of these characters for later use."

The second string describes what the matched data should be replaced with. "\1" means "the contents of the first captured group". The parentheses are just regular parentheses. All together, (\1) in this context means "take the saved content from the captured group, surround it in parentheses, and put it back into the string".

If you want to match more characters than just alphanumeric and underscore, you can replace \w with whatever collection of characters you want to match.

>>> re.sub(r"([\w\.\[\]]*-d)", r"(\1)", "{{startingHere[zero1].my_string-d }}")
'{{(startingHere[zero1].my_string-d) }}'

If you also want to match words ending with "-d()", you can match a parentheses pair with \(\) and mark it as optional using ?.

>>> re.sub(r"([\w\.\[\]]*-d(\(\))?)", r"(\1)", "{{startingHere[zero1].my_string-d() }}")
'{{(startingHere[zero1].my_string-d()) }}'

thanks Kevin for your post! could you please explain how the regular expression works? I'm new to regular expressions :)

Jonathan Eunice · Accepted Answer · 2017-03-30 14:59:41Z

If you want the bracketing to only take place inside double curly braces, you need something like this:

re.sub(r'({{\s*)([^}]*-d)(\s*}})', r'\1(\2)\3', s)

Breaking that down a bit:

# the target pattern
r'({{\s*)([^}]*-d)(\s*}})'
# ^^^^^^^ capture group 1, opening {{ plus optional space
#        ^^^^^^^^^ capture group 2, non-braces plus -d
#                 ^^^^^^^ capture 3, spaces plus closing }}

The replacement r'\1(\2)\3' just assembles the groups, with parenthesis around the middle one.

Putting it together:

import re

def quote_string_d(s):
    return re.sub(r'({{\s*)([^}]*-d)(\s*}})', r'\1(\2)\3', s)

print(quote_string_d("this is my {{string-d}}"))
print(quote_string_d("this is my second{{ new_string-d }}"))
print(quote_string_d("this should not be quoted other_string-d "))

Output:

this is my {{(string-d)}}
this is my second{{ (new_string-d) }}
this should not be quoted other_string-d

Note the third instance does not get the parentheses, because it's not inside {{ }}.

Collectives™ on Stack Overflow

surrounding a pattern in python string with brackets

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related